Slow string concatenation over large input - java

I've written an n-ary tree ADT which works fine. However, I need to store its serialization in a variable a calling class. eg.
DomTree<String> a = Data.createTreeInstance("very_large_file.xml");
String x = a.toString();
I've written method which serves the purpose exactly how I need it, but on very large inputs it takes forever (20mins on a 100MB xml file) - I have timed the methods and building the tree from the xml file is quick, but calling toString() as shown above is very slow.
#Override
public String toString(){
return printTree(this);
}
public String printTree(AbstractTree<E> tree){
if (tree.isLeaf()){
return tree.getNodeName();
}else{
String tStr = tree.getNodeName() + "(";
int i = 0;
Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
while (i < tree.getChildren().size() - 1){
tStr += printTree(child.next()) + ", ";
i++;
}
tStr += printTree(child.next()) + ")";
return tStr;
}
}
I'm guessing it is to do with the way the string is built up rather than how the tree is traversed? Is there a better way to do this?
UPDATE: Following the example of Skaffman, the following code give outOfMemoryError for very large input.
#Override
public String toString(){
StringBuilder buffer = new StringBuilder();
printTree(this, buffer);
return buffer.toString();
}
public String printTree(AbstractTree<E> tree, StringBuilder buffer){
if (tree.isLeaf()){
return tree.getNodeName();
}else{
buffer.append(tree.getNodeName());
buffer.append("(");
int i = 0;
Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
while (i < tree.getChildren().size() - 1){
buffer.append(printTree(child.next(), buffer));
buffer.append(", ");
i++;
}
buffer.append(printTree(child.next(), buffer));
buffer.append(")");
return buffer.toString();
}
}
UPDATE: Works perfectly now, using Skaffmans example

String concats like that are punishingly slow. Use a StringBuilder.
#Override
public String toString(){
StringBuilder buffer = new StringBuilder();
printTree(this, buffer);
return buffer.toString();
}
public void printTree(AbstractTree<E> tree, StringBuilder buffer){
if (tree.isLeaf()){
buffer.append(tree.getNodeName());
} else {
buffer.append(tree.getNodeName());
buffer.append("(");
int i = 0;
Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
while (i < tree.getChildren().size() - 1){
printTree(child.next(), buffer);
buffer.append(", ");
i++;
}
printTree(child.next(), buffer);
buffer.append(")");
}
}

Don't use string concatenation in loops. It does not scale.
Use StringBuilder, this does not make new objects all the time, like string concatenation..
void print() {
StringBuilder sb = new StringBuilder();
sb.append("hello");
sb.append(" World!");
System.out.println(sb.toString());
}

Let me say the reason that string concatenation is slow is because strings are immutable. This means every time you write "+=", a new String is created. This means the way you build up your string is in the worst case, O(n2). That's because if you +='ed 1 char at a time, the cost of building a new string would be 2 + 3 + 4 + ... + n, which is O(n2).
Use StringBuilder as other's suggest (over the slower, but threadsafe StringBuffer).
I suppose I should add, StringBuilder will give you O(n) amortized time, because it works like a vector behind the scenes, since it is mutable. So build up your string there, and then call toString().
StringBuilder builder = new StringBuilder();
builder.append("blah"); // append more as needed.
String text = builder.toString();
I would also like to add that this problem is similar in Python. The idiom in python is to append all your strings to concatenate into a list, and then join the list. "".join(the_list).
UPDATE: As Bill points out, concatenation is not the root of all evil. One off string concatenations are fine, and may even be optimized! (They are also worst case linear). But, when you are concatenating in a loop, as you are above, the performance will drastically change as the number of iterations goes up. In that case, my above analysis is flawless, as I specifically stated it is "worst case", which means you assume no optimizations. (Which the JVM can't even optimize the concatenation in loops as well as it can outside).

Look at StringBuilder, don't use simple concatenation, and pass the StringBuilder through your entire process (or make it a global).

If a profiler confirms you that the bottleneck is string concatenation you have two choices:
StringBuilder/StringBuffer (the latter is better suited for threading)
Ropes for Java:
A rope is a high performance replacement for Strings. The datastructure, described in detail in "Ropes: an Alternative to Strings", provides asymptotically better performance than both String and StringBuffer for common string modifications like prepend, append, delete, and insert. Like Strings, ropes are immutable and therefore well-suited for use in multi-threaded programming.

You might want to look at String.intern() as a way to cut down on memory use. This will use the interned String from the string pool. If you have many duplicated strings, it might be faster. More info on interned strings here

Related

StringBuilder - setLength(0) vs new instantiate, what is better?

i have a loop (repeated 2 times every iteration when i'm received data from Bluetooth), and i'm using StringBuilder for append the data separate and on final process i need clear the StringBuilder obj,
I'm using now new instantiate, but, i already used .setLength method, which is better than .setLength or new instantiate?
Example of code that receive data from Bluetooth device:
private void receive(byte[] data) {
peso.append(new String(data));
receiveText.setText(peso.toString().replaceAll("(\\r|\\n)", "")+" Kg");
// int i = Integer.parseInt(peso.toString().replaceAll(("\\r|\\n"), ""));
Log.i("Val of append actual", String.valueOf(peso));
if(peso.length() >= 3)
peso = new StringBuilder();
}
Other details:
I know the max size for my "peso", this is can help on the choice?
I am not sure about your use case. But anyhow, new instantiate is always a good option rather than setting length to zero, it shows better intention of code as your code will be more understandable and readable.
The performance difference is really negligible, but your code will be simpler.
Instantiating can be slow, because creating a new Object takes time, but deletion of content from StringBuilder requires no memory allocation process for its internal array, which makes the process better and faster.
I have even read that new StringBuilder was faster in one instance.
So that is more or less an irrelevant micro-optimisation. Profile it in your case.
I would give an initial capacity: new StringBuilder(64).
Remarks:
Currently if (peso.length() >= 3) is probably for testing.
Specify the Charset of the data bytes. Best StandardCharsets.UTF_8.
Logging is slow; remove it.
Regex is slow. Below is the non-regex replace.
\R is for any newline, \n, \r\n, \r, NEL.
So:
private void receive(byte[] data) {
peso.append(new String(data, StandardCharsets.ISO_8859_1));
String pesoS = peso.toString();
//receiveText.setText(pesoS.replaceAll("\\R", "")+" Kg");
receiveText.setText(pesoS.replace("\r", "").replace("\n", "")+" Kg");
Log.i("Val of append actual", pesoS);
if (peso.length() >= 3)
peso = new StringBuilder(16);
}

Connecting two Strings

I know that StrinbBuilder is good to use, for the connecting two string objects. I am wondering what is happening while doing this:
"a" + i + "b";
Which option is fastest and safest:
1.
int i = 0;
String a = "a" + i + "b";
2.
int i = 0;
String a = "a".concat(String.valueOf(i)).concat("b");
3.
int i = 0;
String a = String.format("a%db", i);
None of them are really optimal for a string concatenation. Do use a StringBuilder, but use it properly, without discarding it each time:
StringBuilder a = new StringBuilder("a");
for(int i = 0;i<10;i++){
a.append("b");
}
This will perform this without creating new String objects each time.
None of them are really good - you're still creating a StringBuilder under the covers and discarding it right after. Here are two other options:
Create the StringBuilder explicitly and loop.
StringBuilder builder = new StringBuilder("a");
for(int i = 0; i < 10; i++) {
a.append("b");
}
String a = builder.toString();
Use Guava's Strings.repeat() method, and concatenate the result to a StringBuilder.
String a = new StringBuilder("a").append(Strings.repeat("b", 9).toString();
In terms of performance, there really isn't too much to improve - at some level you're creating n Strings. The only thing we could realistically improve is the memory overhead.
because strings are immutable in java so every time when you do a string operation that result a new string object, if memory should be considered you must not use string concatenation with "+" sign as it will create multiple string objects and cleanup is not guaranteed as when garbage collector run you dont know, you should also consider that operations should be thread safe so be careful while using stringbuffer and stringbuilder

Optimum time to perform an operation: within, or after loop

I am reading a file to parse later on. The file is not likely to exceed an MB in size, so this is perhaps not a crucial question for me at this stage. But for best practise reasons, I'd like to know when is the optimum time to perform an operation.
Example:
Using a method I've pasted from http://www.dzone.com/snippets/java-read-file-string, I am reading a buffer into a string. I would now like to remove all whitespace. My method is currently this:
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuffer fileData = new StringBuffer(1024);
BufferedReader reader = new BufferedReader(
new FileReader(filePath));
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
listRaw = fileData.toString().replaceAll("\\s","");
return true;
}
So, I remove all whitespace from the string at the time I store it - in it's entirety - to a class variable.
To me, this means less processing but more memory usage. Would I be better off applying the replaceAll() operation on the readData variable as I append it to fileData for best practise reasons? Using more processing but avoiding passing superfluous whitespace around.
I imagine this has little impact for a small file like the one I am working on, but what if it's a 200MB log file?
Is it entirely case-dependant, or is there a consensus I'd do better to follow?
Thanks for the input everybody. I'm sure you've helped to aim my mindset in the right direction for writing Java.
I've updated my code to take into consideration the points raised. Including the suggestion by Don Roby that at some point, I may want to keep spaces. Hopefully things read better now!
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuilder fileData = new StringBuilder(51200);
BufferedReader reader = new BufferedReader(new FileReader(filePath));
char[] buf = new char[51200];
boolean spaced = false;
while(reader.read(buf) != -1){
for(int i=0;i<buf.length;i++) {
char c = buf[i];
if (c != '\t' && c != '\r' && c != '\n') {
if (c == ' ') {
if (spaced) {
continue;
}
spaced = true;
} else {
spaced = false;
}
fileData.append(c);
}
}
}
reader.close();
listRaw = fileData.toString().trim();
return true;
}
You'd better create and apply the regexp replacement only once, at the end. But you would gain much more by
initializing the StringBuilder with a reasonable size
avoiding the creation of a String inside the loop, and append the read characters directly to the StringBuilder
avoiding the instantiation of a new char buffer, for nothing, at each iteration.
To avoid an unnecessary long temporary String creation, you could read char by char, and only append the char to the StringBuilder if it's not a whitespace. In the end, the StringBuilder would contain only the good characters, and you wouldn't need any replaceAll() call.
THere are actually several very significant inefficiencies in this code, and you'd have to fix them before worrying about the relatively less important issue you've raised.
First, don't create a new buf object on each iteration of the loop -- use the same one! There's no problem with doing so -- the new data overwrites the old, and you save on object allocation (which is one of the more expensive operations you can do.)
Second, similarly, don't create a String to call append() -- use the form of append that takes a char array and an offset (0, in this case) and length (numRead, in this case.) Again, you create one less object per loop iteration.
Finally, to come to the question you actually asked: doing it in the loop would create a String object per iteration, but with the tuning we've just done, you're creating zero objects per iterataion -- so removing the whitespace at the end of the loop is the clear winner!
Depending somewhat on the parse you're going to do, you may well be better off not removing the spaces in a separate step at all, and just ignore them during the parse.
It's also reasonably rare to want to remove all whitespace. Are you sure you don't want to just replace multiple spaces with single spaces?

java: use StringBuilder to insert at the beginning

I could only do this with String, for example:
String str="";
for(int i=0;i<100;i++){
str=i+str;
}
Is there a way to achieve this with StringBuilder? Thanks.
StringBuilder sb = new StringBuilder();
for(int i=0;i<100;i++){
sb.insert(0, Integer.toString(i));
}
Warning: It defeats the purpose of StringBuilder, but it does what you asked.
Better technique (although still not ideal):
Reverse each string you want to insert.
Append each string to a StringBuilder.
Reverse the entire StringBuilder when you're done.
This will turn an O(n²) solution into O(n).
you can use strbuilder.insert(0,i);
Maybe I'm missing something but you want to wind up with a String that looks like this, "999897969594...543210", correct?
StringBuilder sb = new StringBuilder();
for(int i=99;i>=0;i--){
sb.append(String.valueOf(i));
}
As an alternative solution you can use a LIFO structure (like a stack) to store all the strings and when you are done just take them all out and put them into the StringBuilder. It naturally reverses the order of the items (strings) placed in it.
Stack<String> textStack = new Stack<String>();
// push the strings to the stack
while(!isReadingTextDone()) {
String text = readText();
textStack.push(text);
}
// pop the strings and add to the text builder
String builder = new StringBuilder();
while (!textStack.empty()) {
builder.append(textStack.pop());
}
// get the final string
String finalText = builder.toString();
This thread is quite old, but you could also think about a recursive solution passing the StringBuilder to fill. This allows to prevent any reverse processing etc. Just need to design your iteration with a recursion and carefully decide for an exit condition.
public class Test {
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
doRecursive(sb, 100, 0);
System.out.println(sb.toString());
}
public static void doRecursive(StringBuilder sb, int limit, int index) {
if (index < limit) {
doRecursive(sb, limit, index + 1);
sb.append(Integer.toString(index));
}
}
}
I had a similar requirement when I stumbled on this post. I wanted a fast way to build a String that can grow from both sides ie. add new letters on the front as well as back arbitrarily. I know this is an old post, but it inspired me to try out a few ways to create strings and I thought I'd share my findings. I am also using some Java 8 constructs in this, which could have optimised the speed in cases 4 and 5.
https://gist.github.com/SidWagz/e41e836dec65ff24f78afdf8669e6420
The Gist above has the detailed code that anyone can run.
I took few ways of growing strings in this; 1) Append to StringBuilder, 2) Insert to front of StringBuilder as as shown by #Mehrdad, 3) Partially insert from front as well as end of the StringBuilder, 4) Using a list to append from end, 5) Using a Deque to append from the front.
// Case 2
StringBuilder build3 = new StringBuilder();
IntStream.range(0, MAX_STR)
.sequential()
.forEach(i -> {
if (i%2 == 0) build3.append(Integer.toString(i)); else build3.insert(0, Integer.toString(i));
});
String build3Out = build3.toString();
//Case 5
Deque<String> deque = new ArrayDeque<>();
IntStream.range(0, MAX_STR)
.sequential()
.forEach(i -> {
if (i%2 == 0) deque.addLast(Integer.toString(i)); else deque.addFirst(Integer.toString(i));
});
String dequeOut = deque.stream().collect(Collectors.joining(""));
I'll focus on the front append only cases ie. case 2 and case 5. The implementation of StringBuilder internally decides how the internal buffer grows, which apart from moving all buffer left to right in case of front appending limits the speed. While time taken when inserting directly to the front of the StringBuilder grows to really high values, as shown by #Mehrdad, if the need is to only have strings of length less than 90k characters (which is still a lot), the front insert will build a String in the same time as it would take to build a String of the same length by appending at the end. What I am saying is that time time penalty indeed kicks and is huge, but only when you have to build really huge strings. One could use a deque and join the strings at the end as shown in my example. But StringBuilder is a bit more intuitive to read and code, and the penalty would not matter for smaller strings.
Actually the performance for case 2 is much faster than case 1, which I don't seem to understand. I assume the growth for the internal buffer in StringBuilder would be the same in case of front append and back append. I even set the minimum heap to a very large amount to avoid delay in heap growth, if that would have played a role. Maybe someone who has a better understanding can comment below.
You can use the insert method with the offset.
as offset set to '0' means you are appending to the front of your StringBuilder.
StringBuilder sb = new StringBuilder();
for(int i=0;i<100;i++){
sb.insert(0,i);
}
NOTE:
as the insert method accept all types of primitives, you can use for int, long, char[] etc.
Difference Between String, StringBuilder And StringBuffer Classes
String
String is immutable ( once created can not be changed )object. The object created as a
String is stored in the Constant String Pool.
Every immutable object in Java is thread-safe, which implies String is also thread-safe. String
can not be used by two threads simultaneously.
String once assigned can not be changed.
StringBuffer
StringBuffer is mutable means one can change the value of the object. The object created
through StringBuffer is stored in the heap. StringBuffer has the same methods as the
StringBuilder , but each method in StringBuffer is synchronized that is StringBuffer is thread
safe .
Due to this, it does not allow two threads to simultaneously access the same method. Each
method can be accessed by one thread at a time.
But being thread-safe has disadvantages too as the performance of the StringBuffer hits due
to thread-safe property. Thus StringBuilder is faster than the StringBuffer when calling the
same methods of each class.
String Buffer can be converted to the string by using
toString() method.
StringBuffer demo1 = new StringBuffer("Hello") ;
// The above object stored in heap and its value can be changed.
/
// Above statement is right as it modifies the value which is allowed in the StringBuffer
StringBuilder
StringBuilder is the same as the StringBuffer, that is it stores the object in heap and it can also
be modified. The main difference between the StringBuffer and StringBuilder is
that StringBuilder is also not thread-safe.
StringBuilder is fast as it is not thread-safe.
/
// The above object is stored in the heap and its value can be modified
/
// Above statement is right as it modifies the value which is allowed in the StringBuilder
How about:
StringBuilder builder = new StringBuilder();
for(int i=99;i>=0;i--){
builder.append(Integer.toString(i));
}
builder.toString();
OR
StringBuilder builder = new StringBuilder();
for(int i=0;i<100;i++){
builder.insert(0, Integer.toString(i));
}
builder.toString();
But with this, you are making the operation O(N^2) instead of O(N).
Snippet from java docs:
Inserts the string representation of the Object argument into this
character sequence. The overall effect is exactly as if the second
argument were converted to a string by the method
String.valueOf(Object), and the characters of that string were then
inserted into this character sequence at the indicated offset.

String or StringBuffer

private void doShareEmp(pageBean UTIL, HttpServletRequest request, String page)
throws Exception
{
doAction(request, UTIL, page);
String action = pageBean.getSafeRequestOrNullParameter(request, "DO");
long empRecNum = UTIL.getNumValue("EMPLOYEE", "REC_NUM");
if (action != null)
{
if (action.startsWith("US:"))
unshareEmployee(request, UTIL, action.substring(3));
else if (action.equals("SHARE") && empRecNum != 0)
shareEmployee(request, UTIL, empRecNum);
}
ListBean list = UTIL.getListBean(request, "EMPSHARELIST", true);
if (empRecNum != 0)
{
StringBuffer sql = new StringBuffer();
sql.append("SELECT FLDREC_NUM, FLDCOMPANY, FLDLOCATION, FLDDEPT FROM #SCHEMAEMPLVIEW WHERE FLDEMPLOYEE = ? AND FLDTABLE='SHARED' ORDER BY FLDCOMPANY, FLDLOCATION, FLDDEPT");
ArrayList qryParms = new ArrayList();
qryParms.add(new Long(empRecNum));
list.setQuery(UTIL, sql, qryParms);
}
else
list.init();
}
In this piece of code i am appending an query to a StringBuffer.
Which one will be better?
String
StringBuffer
StringBuilder
StringBuilder is a replacement to StringBuffer in a single threaded environment since 1.5, so go with StringBuilder. If you are not going to do any other manipulation with the data after the fact, go with String.
StringBuffer is only needed in threaded environment and if you need synchronization. Here it doesn't seem to be the case.
Also, your string seems defined one and for all, a simple String would be enough.
A StringBuilder is interesting when you are modifying your "string" by appending content. If you already have all your content, no need for a StringBuilder.
But you can already read all these informations on their javadocs :
The StringBuilder class should generally be used in preference to this one, as it supports all of the same operations but it is faster, as it performs no synchronization.
Strings are immutable in Java so any time you modify the String object you're using StringBuilder anyway. If the String is immutable then use String, otherwise create a StringBuilder and convert it to a String when you are done modifying it.
Unless you execute this method (to run the query) thousands of times per second, use a plain String. It's the most readable, fast and compact solution.
You might be too verbose. If your code is
String sql = "SELECT COLUMNA,";
if(foo)
sql += "COLUMNB"
else
sql += "COLUMNC"
Then the compiler is actually going to optimize and use a StringBuffer.
You should not be wasting your time worrying about how to concatenate 2 strings. Thats not the mark of a great programmer, if that's what you thought.
Try this ->
long finalTime1 = 0;
{
long initialTimeTest = System.currentTimeMillis();
for( int index = 0; index < 10000; index++ ){
StringBuilder sb = new StringBuilder("Hello, ").append("World");
System.out.println(sb.toString());
}
finalTime1 = System.currentTimeMillis() - initialTimeTest;
}
long finalTime2 = 0;
{
long initialTimeTest = System.currentTimeMillis();
for( int index = 0; index < 10000; index++ ){
String sb = "Hello, " + "World";
System.out.println( sb );
}
finalTime2 = System.currentTimeMillis() - initialTimeTest;
}
System.out.println( finalTime1 );
System.out.println( finalTime2 );
Results:
...
Hello, World
Hello, World
245
148
Did you think string buffer was faster ??
We are breaking the mother of all rules: Keep it Simple. -
For mundane string handling there is no reason why to use StringBuilder. It just adds unnecessary complexity to a mundane task.
Please, we need to think BIG, think in the overall business impact of the module to the project. Discussing whether we shall assemble two strings with StringBuilder or String is thinking little, - don't do that.

Categories

Resources