StringBuilder - setLength(0) vs new instantiate, what is better? - java

i have a loop (repeated 2 times every iteration when i'm received data from Bluetooth), and i'm using StringBuilder for append the data separate and on final process i need clear the StringBuilder obj,
I'm using now new instantiate, but, i already used .setLength method, which is better than .setLength or new instantiate?
Example of code that receive data from Bluetooth device:
private void receive(byte[] data) {
peso.append(new String(data));
receiveText.setText(peso.toString().replaceAll("(\\r|\\n)", "")+" Kg");
// int i = Integer.parseInt(peso.toString().replaceAll(("\\r|\\n"), ""));
Log.i("Val of append actual", String.valueOf(peso));
if(peso.length() >= 3)
peso = new StringBuilder();
}
Other details:
I know the max size for my "peso", this is can help on the choice?

I am not sure about your use case. But anyhow, new instantiate is always a good option rather than setting length to zero, it shows better intention of code as your code will be more understandable and readable.
The performance difference is really negligible, but your code will be simpler.

Instantiating can be slow, because creating a new Object takes time, but deletion of content from StringBuilder requires no memory allocation process for its internal array, which makes the process better and faster.

I have even read that new StringBuilder was faster in one instance.
So that is more or less an irrelevant micro-optimisation. Profile it in your case.
I would give an initial capacity: new StringBuilder(64).
Remarks:
Currently if (peso.length() >= 3) is probably for testing.
Specify the Charset of the data bytes. Best StandardCharsets.UTF_8.
Logging is slow; remove it.
Regex is slow. Below is the non-regex replace.
\R is for any newline, \n, \r\n, \r, NEL.
So:
private void receive(byte[] data) {
peso.append(new String(data, StandardCharsets.ISO_8859_1));
String pesoS = peso.toString();
//receiveText.setText(pesoS.replaceAll("\\R", "")+" Kg");
receiveText.setText(pesoS.replace("\r", "").replace("\n", "")+" Kg");
Log.i("Val of append actual", pesoS);
if (peso.length() >= 3)
peso = new StringBuilder(16);
}

Related

Optimum time to perform an operation: within, or after loop

I am reading a file to parse later on. The file is not likely to exceed an MB in size, so this is perhaps not a crucial question for me at this stage. But for best practise reasons, I'd like to know when is the optimum time to perform an operation.
Example:
Using a method I've pasted from http://www.dzone.com/snippets/java-read-file-string, I am reading a buffer into a string. I would now like to remove all whitespace. My method is currently this:
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuffer fileData = new StringBuffer(1024);
BufferedReader reader = new BufferedReader(
new FileReader(filePath));
char[] buf = new char[1024];
int numRead=0;
while((numRead=reader.read(buf)) != -1){
String readData = String.valueOf(buf, 0, numRead);
fileData.append(readData);
buf = new char[1024];
}
reader.close();
listRaw = fileData.toString().replaceAll("\\s","");
return true;
}
So, I remove all whitespace from the string at the time I store it - in it's entirety - to a class variable.
To me, this means less processing but more memory usage. Would I be better off applying the replaceAll() operation on the readData variable as I append it to fileData for best practise reasons? Using more processing but avoiding passing superfluous whitespace around.
I imagine this has little impact for a small file like the one I am working on, but what if it's a 200MB log file?
Is it entirely case-dependant, or is there a consensus I'd do better to follow?
Thanks for the input everybody. I'm sure you've helped to aim my mindset in the right direction for writing Java.
I've updated my code to take into consideration the points raised. Including the suggestion by Don Roby that at some point, I may want to keep spaces. Hopefully things read better now!
private String listRaw;
public boolean readList(String filePath) throws java.io.IOException {
StringBuilder fileData = new StringBuilder(51200);
BufferedReader reader = new BufferedReader(new FileReader(filePath));
char[] buf = new char[51200];
boolean spaced = false;
while(reader.read(buf) != -1){
for(int i=0;i<buf.length;i++) {
char c = buf[i];
if (c != '\t' && c != '\r' && c != '\n') {
if (c == ' ') {
if (spaced) {
continue;
}
spaced = true;
} else {
spaced = false;
}
fileData.append(c);
}
}
}
reader.close();
listRaw = fileData.toString().trim();
return true;
}
You'd better create and apply the regexp replacement only once, at the end. But you would gain much more by
initializing the StringBuilder with a reasonable size
avoiding the creation of a String inside the loop, and append the read characters directly to the StringBuilder
avoiding the instantiation of a new char buffer, for nothing, at each iteration.
To avoid an unnecessary long temporary String creation, you could read char by char, and only append the char to the StringBuilder if it's not a whitespace. In the end, the StringBuilder would contain only the good characters, and you wouldn't need any replaceAll() call.
THere are actually several very significant inefficiencies in this code, and you'd have to fix them before worrying about the relatively less important issue you've raised.
First, don't create a new buf object on each iteration of the loop -- use the same one! There's no problem with doing so -- the new data overwrites the old, and you save on object allocation (which is one of the more expensive operations you can do.)
Second, similarly, don't create a String to call append() -- use the form of append that takes a char array and an offset (0, in this case) and length (numRead, in this case.) Again, you create one less object per loop iteration.
Finally, to come to the question you actually asked: doing it in the loop would create a String object per iteration, but with the tuning we've just done, you're creating zero objects per iterataion -- so removing the whitespace at the end of the loop is the clear winner!
Depending somewhat on the parse you're going to do, you may well be better off not removing the spaces in a separate step at all, and just ignore them during the parse.
It's also reasonably rare to want to remove all whitespace. Are you sure you don't want to just replace multiple spaces with single spaces?

What is the quickest / most efficient way to append a char to a file loaded into memory?

read_data = new BufferedReader( new FileReader(args[0]) );
data_buffer = new StringBuffer();
int i;
while(read_data.ready())
{
while((i = read_data.read()) != -1)
{
data_buffer.append((char)i);
}
}
data_buffer.append(System.getProperty("line.separator"));
What I'm trying to do is, read an entire .txt file into a string and append a newline to the string. And then be able to process this string later on by creating a new Scanner by passing data_buffer.toString(). Obviously on really large files this process takes up a lot of time, and all I want to do is just append a newline to the .txt file I've read into memory.
I'm aware the whole idea seems a bit hacky or weird, but are there any quicker methods?
Cheers :)
The fastest way to do something is often to not do it at all.
Why don't you modify the parsing code in such way that the newline at the end is not required? If you are appending it each time, you could as well change the code to behave as if it were there while it really isn't.
The next thing I would try would be to avoid creating a huge String char by char, as this is indeed rather costly. You can create a Scanner based on an InputStream and it will probably be much faster than reading data into a String and parsing that. You can override your FileInputStream to return a virtual newline character at the end of the file, thus avoiding the instatiation of the pasted string.
And if you absolutely positively did have to read the data into a buffer, you would probably be better off by reading into a byte array using the array-based read() methods of the stream - much faster than byte by byte. Since you can know the size of the file in advance, you could allocate your buffer with space for the extra end-of-line marker and insert it into the array. In contrast to creating a StringBuffer and making a String out of it, this does not require a full copy of the buffer.
From what I can tell, what you are actually trying to do is to read a file in such a way that it always appears to have a line separator at the end of the last line.
If that is the case, then you could do this by implementing a subtype of FilterReader, and have it "insert" an extra character or two if required when it reaches the end of the character stream.
The code to do this won't be trivial, but it will avoid the time and space overhead of buffering the entire file in memory.
If all you're doing is passing the resulting file in to a Scanner, you should create a Readable for the file and send that to Scanner.
Here's an example (untested):
public class NLReader implements Readable {
Reader r;
boolean atEndOfReader = false;
boolean atEnd = false;
public NLReader(Reader r) {
this.r = r;
}
public int read(CharBuffer cb) throws IOException {
if (!atEndOfReader) {
int result = r.read(cb);
if (result == -1) {
atEndOfReader = true;
} else {
return result;
}
}
if (!atEnd) {
String nl = System.getProperty("line.separator");
cb.append(nl);
atEnd = true;
return nl.length();
}
return -1;
}
}
This only reads the file once, and never copies it (unlike your StringBuffer -- and you should be using StringBuilder instead unless you really need the synchronization of StringBuffer).
This also doesn't load the actual file in to memory, so that can save memory pressure as well.

java efficient way to process big text files

Im doing a frequency dictionary, in which i read 1000 files, each one with about 1000 lines. The approach i'm following is:
BufferedReader to read fileByFile
read the first file, get the first sentence, split the sentence to an array string, then fill in an hashmap with the values from the string array.
do this for all the senteces in that file
do this for all 1000 files
My problem is, this is not a very efficient way to do it, i'm taking about 4 minutes to do all this. I'v increased heap size, refactored the code to make sure i'm not doind something wrong. For this approach, i'm completly sure there's nothing i can improve in the code.
My bet is, each time a sentece is read, a split is applied, which, multiplied by 1000 sentences in a file and by 1000 files is a huge ammount of splits to process.
My idea is, instead of read and process file-by-file, i could read each file to a char array, and then make the split only once per file. That would ease the ammount of processing times consuming with the split. Any suggestions of implementation would be appreciated.
OK, I have just implemented the POC of your dictionary. Fast and dirty. My files contained 868 lines each one but I created 1024 copies of the same file. (This is table of contents of Spring Framework documentation.)
I ran my test and it took 14020 ms (14 seconds!). BTW I ran it from eclipse that could decrease the speed a little bit.
So, I do not know where your problem is. Please try my code on your machine and if it runs faster try to compare it with your code and understand where the root problem.
Anyway my code is not the fastest I can write.
I can create Pattern before loop and the use it instead of String.split(). String.split() calls Pattern.compile() every time. Creating pattern is very expensive.
Here is the code:
public static void main(String[] args) throws IOException {
Map<String, Integer> words = new HashMap<String, Integer>();
long before = System.currentTimeMillis();
File dir = new File("c:/temp/files");
for (File file : dir.listFiles()) {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
for (String line = reader.readLine(); line != null; line = reader.readLine()) {
String[] lineWords = line.split("\\s+");
for (String word : lineWords) {
int count = 1;
Integer currentCount = words.get(word);
if (currentCount != null) {
count = currentCount + 1;
}
words.put(word, count);
}
}
}
long after = System.currentTimeMillis();
System.out.println("run took " + (after - before) + " ms");
System.out.println(words);
}
If you dont care about the the contents are in different files I would do the approach your are recommending. Read all files and all lines into memory (string, or char array, whatever) and then do the 1 split and hash populate based on the one string/dataset.
If I understand what you're doing, I don't think you want to use strings except when you access your map.
You want to:
loop through files
read each file into a buffer of something like 1024
process the buffer looking for word end characters
create a String from the character array
check your map
if found, update your count, if not, create a new entry
when you reach end of buffer, get the next buffer from the file
at end, loop to next file
Split is probably pretty expensive since it has to interpret the expression each time.
Reading the file as one big string and and then splitting that sounds like a good idea. String splitting/modifying can be surprisingly 'heavy' when it comes to garbage collection. Multiple lines/sentences means multiple Strings and with all the splits it means a huge amount of Strings (Strings are immutable, so any change to them will actually create a new String or multiple Strings)... this produces a lot of garbage to be collected, and the garbage collection could become a bottleneck (with a smaller heap, the maximum amount of memory is reached all the time, kicking off a garbage collection, which potentially needs to clean up hundreds of thousands or millions of separate String-objects).
Of course, without knowing your code this is just a wild guess, but back in the day, I got an old command line Java-programs' (it was a graph-algorithm producing a huge SVG-file) running time to drop from about 18 seconds to less than 0.5 seconds just by modifying the string-handling to use StringBuffers/Builders.
Another thing that springs to mind is using multiple threads (or a threadpool) to handle different files concurrently, and then combine the results at the end. Once you get the program to run "as fast as possible", the remaining bottleneck will be the disk access, and the only way (afaik) to get past that is faster disks (SSDs etc.).
Since you're using a bufferedReader, why do you need to read in a whole file explicitly? I definitely wouldn't use split if you're after speed, remember, it has to evaluate a regular expression each time you run it.
Try something like this for your inner loop (note, I have not compiled this or tried to run it):
StringBuilder sb = null;
String delimiters = " .,\t"; //Build out all your word delimiters in a string here
for(int nextChar = br.read(); nextChar >= 0; nextChar = br.read()) {
if(delimiters.indexOf(nextChar) < 0) {
if(sb == null) sb = new StringBuilder();
sb.append((char)(nextChar));
} else {
if(sb != null) {
//Add sb.toString() to your map or increment it
sb = null;
}
}
}
You could try using different sized buffers explicitly, but you probably won't get a performance improvement over this.
One very simple approach which uses minimum heap space and should be (almost) as fast as anything else would be like
int c;
final String SEPARATORS = " \t,.\n"; // extend as needed
final StringBuilder word = new StringBuilder();
while( ( c = fileInputStream.read() ) >= 0 ) {
final char letter = (char) c;
if ( SEPARATORS.indexOf(letter) < 0 ) {
word.append(letter);
} else {
processWord( word.toString() );
word.setLength( 0 );
}
}
extend for more separator characters as needed, possibly use multi-threading to process multiple files concurrently until disc IO becomes the bottle neck...

Android - OutOfMemory when reading text file

I'm making a dictionary app on android. During its startup, the app will load content of .index file (~2MB, 100.000+ lines)
However, when i use BufferedReader.readLine() and do something with the returned string, the app will cause OutOfMemory.
// Read file snippet
Set<String> indexes = new HashSet<String)();
FileInputStream is = new FileInputStream(indexPath);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String readLine;
while ( (readLine = reader.readLine()) != null) {
indexes.add(extractHeadWord(readLine));
}
// And the extractHeadWord method
private String extractHeadWord(String string) {
String[] splitted = string.split("\\t");
return splitted[0];
}
When reading log, I found that while executing, it causes the GC explicitly clean objects many times (GC_EXPLICIT freed xxx objects, in which xxx is a big number such as 15000, 20000).
And I tried another way:
final int BUFFER = 50;
char[] readChar = new char[BUFFER];
//.. construct BufferedReader
while (reader.read(readChar) != -1) {
indexes.add(new String(readChar));
readChar = new char[BUFFER];
}
..and it run very fast. But it was not exactly what I wanted.
Is there any solution that run fast as the second snippet and easy to use as the first?
Regard.
The extractHeadWord uses String.split method. This method does not create new strings but relies on the underlying string (in your case the line object) and uses indexes to point out the "new" string.
Since you are not interessed in the rest of the string you need to discard the it so it gets garbage collected otherwise the whole string will be in memory (but you are only using a part of it).
Calling the constructor String(String) ("copy constructor") discards the rest of string:
private String extractHeadWord(String string) {
String[] splitted = string.split("\\t");
return new String(splitted[0]);
}
What happens if your extractHeadWord does this return new String(splitted[0]);.
It will not reduce temporary objects, but it might reduce the footprint of the application. I don't know if split does about the same as substring, but I guess that it does. substring creates a new view over the original data, which means that the full character array will be kept in memory. Explicitly invoking new String(string) will truncate the data.

java: use StringBuilder to insert at the beginning

I could only do this with String, for example:
String str="";
for(int i=0;i<100;i++){
str=i+str;
}
Is there a way to achieve this with StringBuilder? Thanks.
StringBuilder sb = new StringBuilder();
for(int i=0;i<100;i++){
sb.insert(0, Integer.toString(i));
}
Warning: It defeats the purpose of StringBuilder, but it does what you asked.
Better technique (although still not ideal):
Reverse each string you want to insert.
Append each string to a StringBuilder.
Reverse the entire StringBuilder when you're done.
This will turn an O(n²) solution into O(n).
you can use strbuilder.insert(0,i);
Maybe I'm missing something but you want to wind up with a String that looks like this, "999897969594...543210", correct?
StringBuilder sb = new StringBuilder();
for(int i=99;i>=0;i--){
sb.append(String.valueOf(i));
}
As an alternative solution you can use a LIFO structure (like a stack) to store all the strings and when you are done just take them all out and put them into the StringBuilder. It naturally reverses the order of the items (strings) placed in it.
Stack<String> textStack = new Stack<String>();
// push the strings to the stack
while(!isReadingTextDone()) {
String text = readText();
textStack.push(text);
}
// pop the strings and add to the text builder
String builder = new StringBuilder();
while (!textStack.empty()) {
builder.append(textStack.pop());
}
// get the final string
String finalText = builder.toString();
This thread is quite old, but you could also think about a recursive solution passing the StringBuilder to fill. This allows to prevent any reverse processing etc. Just need to design your iteration with a recursion and carefully decide for an exit condition.
public class Test {
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
doRecursive(sb, 100, 0);
System.out.println(sb.toString());
}
public static void doRecursive(StringBuilder sb, int limit, int index) {
if (index < limit) {
doRecursive(sb, limit, index + 1);
sb.append(Integer.toString(index));
}
}
}
I had a similar requirement when I stumbled on this post. I wanted a fast way to build a String that can grow from both sides ie. add new letters on the front as well as back arbitrarily. I know this is an old post, but it inspired me to try out a few ways to create strings and I thought I'd share my findings. I am also using some Java 8 constructs in this, which could have optimised the speed in cases 4 and 5.
https://gist.github.com/SidWagz/e41e836dec65ff24f78afdf8669e6420
The Gist above has the detailed code that anyone can run.
I took few ways of growing strings in this; 1) Append to StringBuilder, 2) Insert to front of StringBuilder as as shown by #Mehrdad, 3) Partially insert from front as well as end of the StringBuilder, 4) Using a list to append from end, 5) Using a Deque to append from the front.
// Case 2
StringBuilder build3 = new StringBuilder();
IntStream.range(0, MAX_STR)
.sequential()
.forEach(i -> {
if (i%2 == 0) build3.append(Integer.toString(i)); else build3.insert(0, Integer.toString(i));
});
String build3Out = build3.toString();
//Case 5
Deque<String> deque = new ArrayDeque<>();
IntStream.range(0, MAX_STR)
.sequential()
.forEach(i -> {
if (i%2 == 0) deque.addLast(Integer.toString(i)); else deque.addFirst(Integer.toString(i));
});
String dequeOut = deque.stream().collect(Collectors.joining(""));
I'll focus on the front append only cases ie. case 2 and case 5. The implementation of StringBuilder internally decides how the internal buffer grows, which apart from moving all buffer left to right in case of front appending limits the speed. While time taken when inserting directly to the front of the StringBuilder grows to really high values, as shown by #Mehrdad, if the need is to only have strings of length less than 90k characters (which is still a lot), the front insert will build a String in the same time as it would take to build a String of the same length by appending at the end. What I am saying is that time time penalty indeed kicks and is huge, but only when you have to build really huge strings. One could use a deque and join the strings at the end as shown in my example. But StringBuilder is a bit more intuitive to read and code, and the penalty would not matter for smaller strings.
Actually the performance for case 2 is much faster than case 1, which I don't seem to understand. I assume the growth for the internal buffer in StringBuilder would be the same in case of front append and back append. I even set the minimum heap to a very large amount to avoid delay in heap growth, if that would have played a role. Maybe someone who has a better understanding can comment below.
You can use the insert method with the offset.
as offset set to '0' means you are appending to the front of your StringBuilder.
StringBuilder sb = new StringBuilder();
for(int i=0;i<100;i++){
sb.insert(0,i);
}
NOTE:
as the insert method accept all types of primitives, you can use for int, long, char[] etc.
Difference Between String, StringBuilder And StringBuffer Classes
String
String is immutable ( once created can not be changed )object. The object created as a
String is stored in the Constant String Pool.
Every immutable object in Java is thread-safe, which implies String is also thread-safe. String
can not be used by two threads simultaneously.
String once assigned can not be changed.
StringBuffer
StringBuffer is mutable means one can change the value of the object. The object created
through StringBuffer is stored in the heap. StringBuffer has the same methods as the
StringBuilder , but each method in StringBuffer is synchronized that is StringBuffer is thread
safe .
Due to this, it does not allow two threads to simultaneously access the same method. Each
method can be accessed by one thread at a time.
But being thread-safe has disadvantages too as the performance of the StringBuffer hits due
to thread-safe property. Thus StringBuilder is faster than the StringBuffer when calling the
same methods of each class.
String Buffer can be converted to the string by using
toString() method.
StringBuffer demo1 = new StringBuffer("Hello") ;
// The above object stored in heap and its value can be changed.
/
// Above statement is right as it modifies the value which is allowed in the StringBuffer
StringBuilder
StringBuilder is the same as the StringBuffer, that is it stores the object in heap and it can also
be modified. The main difference between the StringBuffer and StringBuilder is
that StringBuilder is also not thread-safe.
StringBuilder is fast as it is not thread-safe.
/
// The above object is stored in the heap and its value can be modified
/
// Above statement is right as it modifies the value which is allowed in the StringBuilder
How about:
StringBuilder builder = new StringBuilder();
for(int i=99;i>=0;i--){
builder.append(Integer.toString(i));
}
builder.toString();
OR
StringBuilder builder = new StringBuilder();
for(int i=0;i<100;i++){
builder.insert(0, Integer.toString(i));
}
builder.toString();
But with this, you are making the operation O(N^2) instead of O(N).
Snippet from java docs:
Inserts the string representation of the Object argument into this
character sequence. The overall effect is exactly as if the second
argument were converted to a string by the method
String.valueOf(Object), and the characters of that string were then
inserted into this character sequence at the indicated offset.

Categories

Resources