My application constructs very long strings (each of about 30000 characters). Due to this, I am running out of memory, as well as I see that there is lot of garbage collection of char[] objects. This is because I am using the Stringbuilder, which possibly uses char[] internally.
Also, a major part of this huge string is fixed (say 29800 characters). So, what varies is only the remaining part. If I create a static String and then concatenate them, I still end up with the same GC results, since '+' gets compiled as StringBuilder internally.
Is there an efficient way to construct long strings in Java, without GCing?
EDIT:
Some sample (pseudo) code snippet:
private static String getString() {
StringBuilder sb = getStringBufferFromPool();
sb.append(<variable_part>);
sb.append(<constant_part>);
return sb.toString();
}
Related
What is wrong with masking a data like this using concatenation?
linkedNumber = linkedNumber.substring(0,3)+"XXXXXXXX"+ linkedNumber.substring(linkedNumber.length-4);
What performance problems would this cause? How can StringBuilder help here ?
When we do String manipulation like concat, substring etc, it creates new string object for result of the manipulation function and make eligible older string for garbage collection. So, these are heavy operations and generates new objects and lot of garbage in heap memory.
StringBuffer and StringBuilder are mutable objects in java and provide append(), insert(), delete() and substring() methods for String manipulation.
StringBuffer provides Thread safety but on a performance cost as its methods are synchronized,but StringBuilder is not thread safety. So, if you don't care about thread safety (single threaded environment), use StringBuilder for better perfomance.
What is wrong with masking a data like this using concatenation?
Nothing, as long as you ensure linkedNumber is at least 4 characters long, though the code only makes sense if it is more than 7 characters long.
What performance problems would this cause?
None.
How can StringBuilder help here ?
The following code might be a microscopically faster, by not creating intermediate String objects for the 2 substrings, though I doubt you'd notice it, since they are so small (3 and 4 characters, respectively).
linkedNumber = new StringBuilder()
.append(linkedNumber, 0, 3)
.append("XXXXXXXX")
.append(linkedNumber, linkedNumber.length() - 4, linkedNumber.length())
.toString();
In my application there was a heap dump and surprisingly heap retained by char[] was around 700MB, which was strange (at least for me). At the same time String had only 150MB.
In my application, I have only used StringBuilder (using default StringBuilder constructor) and tried to avoid using String as we were appending data.
My question here is: Should we always go for StringBuilder? And if yes, how can we reduce the heap retained by it?
Yes, always go for StringBuilder when building strings - it's the most efficient, but still convenient, way of concatenating strings.
It sounds like there are lots of StringBuilders hanging around waiting to be garbage collected. However, to reduce heap usage, you can safely reuse your StringBuilders even though they are not threadsafe by using one StringBuilder per thread via Threadlocal:
private static final ThreadLocal<StringBuilder> LOCAL_STRING_BUILDER =
ThreadLocal.withInitial(StringBuilder::new);
Example usage:
public String logMessage() {
StringBuilder sb = LOCAL_STRING_BUILDER.get();
sb.setLength(0); // Only resets the pointer to start. Doesn't affect the backing array
sb.append("foo=").append(myField); //etc
return sb.toString();
}
You will only ever have at most as many StringBuilders as there are threads, which won't be that many (maybe 10's - 100's).
FYI StringBuilder is used when concatenating strings manually anyway; this line of source:
String str3 = str1 + str2;
gets compiled as if it were:
String str3 = new StringBuilder().append(str1).append(str2).toString();
Possibly you can use StringInterner which takes a StringBuilder to avoid creating objects needlessly.You first populate a recycled StringBuilder with the text and if a String matching that text is in the interner, that String is returned (or a toString() of the StringBuilder is.) The benefit is that you only create objects (and no more than needed) when you see a new String (or at least one not in the array) This can get a 80% to 99% hit rate and reduce memory consumption (and garbage) dramatically when loading many strings of data.
Code:
https://github.com/OpenHFT/Java-Lang/blob/master/lang/src/main/java/net/openhft/lang/pool/StringInterner.java
In an application a String is a often used data type. What we know, is that the mutation of a String uses lots of memory. So what we can do is to use a StringBuilder/StringBuffer.
But at what point should we change to StringBuilder?
And what should we do, when we have to split it or to remplace characters in there?
eg:
//original:
String[] split = string.split("?");
//better? :
String[] split = stringBuilder.toString().split("?);
or
//original:
String replacedString = string.replace("l","st");
//better? :
String replacedString = stringBuilder.toString().replace("l","st");
//or
StringBuilder replacedStringBuilder = new StringBuilder(stringBuilder.toString().replace("l","st);
In your examples, there are no benefits in using a StringBuilder, since you use the toString method to create an immutable String out of your StringBuilder.
You should only copy the contents of a StringBuilder into a String after you are done appending it (or modifying it in some other way).
The problem with Java's StringBuilder is that it lacks some methods you get when using a plain string (check this thread, for example: How to implement StringBuilder.replace(String, String)).
What we know, is that a String uses lots of memory.
Actually, to be precise, a String uses less memory than a StringBuilder with equivalent contents. A StringBuilder class has some additional constant overhead, and usually has a preallocated buffer to store more data than needed at any given moment (to reduce allocations). The issue with Strings is that they are immutable, which means Java needs to create a new instance whenever you need to change its contents.
To conclude, StringBuilder is not designed for the operations you mentioned (split and replace), and it won't yield much better performance in any case. A split method cannot benefit from StringBuilder's mutability, since it creates an array of immutable strings as its output anyway. A replace method still needs to iterate through the entire string, and do a lot of copying if replaced string is not the same size as the searched one.
If you need to do a lot of appending, then go for a StringBuilder. Since it uses a "mutable" array of characters under the hood, adding data to the end will be especially efficient.
This article compares the performance of several StringBuilder and String methods (although I would take the Concatenation part with reserve, because it doesn't mention dynamic string appending at all and concentrates on a single Join operation only).
What we know, is that the mutation of a String uses lots of memory.
That is incorrect. Strings cannot be mutated. They are immutable.
What you are actually talking about is building a String from other strings. That can use a lot more memory than is necessary, but it depends how you build the string.
So what we can do is to use a StringBuilder/StringBuffer.
Using a StringBuilder will help in some circumstances:
String res = "";
for (String s : ...) {
res = res + s;
}
(If the loop iterates many times then optimizing the above to use a StringBuilder could be worthwhile.)
But in other circumstances it is a waste of time:
String res = s1 + s2 + s3 + s4 + s5;
(It is a waste of time to optimize the above to use a StringBuilder because the Java compiler will automatically translate the expression into code that creates and uses a StringBuilder.)
You should only ever use a StringBuffer instead of a StringBuilder when the string needs to be accessed and/or updated by more than one thread; i.e. when it needs to be thread-safe.
But at what point should we change to StringBuilder?
The simple answer is to only do it when the profiler tells you that you have a performance problem in your string handling / processing.
Generally speaking, StringBuilders are used for building strings rather as the primary representation of the strings.
And what should we do, when we have to split it or to replace characters in there?
Then you have to review your decision to use a StringBuilder / StringBuffer as your primary representation at that point. And if it is still warranted you have to figure out how to do the operation using the API you have chosen. (This may entail converting to a String, performing the operation and then creating a new StringBuilder from the result.)
If you frequently modify the string, go with StringBuilder. Otherwise, if it's immutable anyway, go with String.
To answer your question on how to replace characters, check this out: http://download.oracle.com/javase/tutorial/java/data/buffers.html. StringBuilder operations is what you want.
Here's another good write-up on StringBuilder: http://www.yoda.arachsys.com/csharp/stringbuilder.html
If you need to lot of alter operations on your String, then you can go for StringBuilder. Go for StringBuffer if you are in multithreaded application.
Both a String and a StringBuilder use about the same amount of memory. Why do you think it is “much”?
If you have measured (for example with jmap -histo:live) that the classes [C and java.lang.String take up most of the memory in the heap, only then should you think further in this direction.
Maybe there are multiple strings with the same value. Then, since Strings are immutable, you could intern the duplicate strings. Don't use String.intern for it, since it has bad performance characteristics, but Google Guava's Interner.
Whenever I try to add the numbers in string like:
String s=new String();
for(int j=0;j<=1000000;j++)
s+=String.valueOf(j);
My program is adding the numbers, but very slowly. But When I altered my program and made it like:
StringBuffer sb=new StringBuffer();
for(int j=0;j<=1000000;j++)
sb.append(String.valueOf(j));
I got the result very quickly. Why is that so?
s+=String.valueOf(j); needs to allocate a new String object every time it is called, and this is expensive. The StringBuffer only needs to grow some internal representation when the contained string is too large, which happens much less often.
It would probably be even faster if you used a StringBuilder, which is a non-synchronized version of a StringBuffer.
One thing to note is that while this does apply to loops and many other cases, it does not necessarily apply to all cases where Strings are concatenated using +:
String helloWorld = getGreeting() + ", " + getUsername() + "!";
Here, the compiler will probably optimize the code in a way that it sees fit, which may or may not be creating a StringBuilder, since that is also an expensive operation.
Because s += "string" creates a new instance. A String is immutable. StringBuffer or StringBuilder adds the String without creating a new instance.
In Java as in .NET Strings are immutable. They cannot be changed after creation. The result is that using the +operator will create a new string and copy the contents of both strings into it.
A StringBuffer will double the allocated space every time it runs out of space to add characters. Thus reducing the amount of memory allocations.
Take a look at this, from the Javaspecialists newsletter by Heinz Kabutz:
http://www.javaspecialists.eu/archive/Issue068.html
and this later article:
http://java.sun.com/developer/technicalArticles/Interviews/community/kabutz_qa.html
I'm reading from a binary file and want to convert the bytes to US ASCII strings. Is there any way to do this without calling new on String to avoid multiple semantically equal String objects being created in the string literal pool? I'm thinking that it is probably not possible since introducing String objects using double quotes is not possible here. Is this correct?
private String nextString(DataInputStream dis, int size)
throws IOException
{
byte[] bytesHolder = new byte[size];
dis.read(bytesHolder);
return new String(bytesHolder, Charset.forName("US-ASCII")).trim();
You'd have to have a cache mapping byte arrays to strings, then search through the cache for any equal values before creating a new string.
You can intern existing strings with intern() as Yishai posted - that won't stop you from creating more strings, but it'll make all but the first one (for any char sequence) very short lived. On the other hand, it'll make all the distinct strings live for a very long time indeed.
You can have "pseudo-interning" by using a Map<String, String>:
String tmp = new String(bytesHolder, Charset.forName("US-ASCII")).trim();
String cached = cache.get(tmp);
if (cached == null)
{
cached = tmp;
cache.put(tmp, tmp);
}
return cached;
You could even put a bit more effort in and end up with an LRU cache so that it'll keep the N most recently fetched strings, discarding others when it needs to.
None of that reduces the number of strings created in the first place, as I say - but is that likely to be a problem in your situation? GCs have been tuned to make it very cheap to create short-lived objects.
You can call the intern() method on the string to ensure one for the whole JVM.
String s = new String(bytes, "US-ASCII").intern();
You won't avoid creating the initial string again, but you will save on the storage.
That being said, interned strings have a limited storage space, so use with caution. A better option may be to implement a HashMap with the string as the key and value and check if the string already exists and get it if it does, insert it if it doesn't. That way you won't have such memory limitations.
You shouldn’t be concerned about it—unless you profiled your application and have determined the String creation to be the exact source of your problem.
If you find out that the String creation is the source of your problem I would recommend what Jon Skeet proposed, i.e. a mapping from byte[] to String. That has about the same effect as interning your Strings while not hogging up valuable memory until you restart the VM.