My program executes system commands, and returns the output line by line, however, there are a couple of commands that produces lots of lines, in this case the RAM usage rises to ~700Mbs, knowing that the usual RAM usage in any other commands is 50-60Mbs.
This is the method that handles reading the output using BufferedReader, it is called by another method that handles the creation of the process of the command. it also passes the output line by line to showOutputLine() method, which will print it to the console or to a TextArea.
protected void formatStream(InputStream inputStream, boolean isError) {
bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String tempLine = null;
// Read output
try {
while ((tempLine = bufferedReader.readLine()) != null) {
showOutputLine(tempLine, isError);
}
} catch (IOException e) {// just stop
}
}
one example of the commands that causes the issue:
adb logcat
EDIT: it appears BufferedReader is innocent, however, the problem still persists. caused by JTextArea.
BufferedReader always uses about 16 KB (8K * 2 byte chars) in a fixed size array. If you are using more than this it is a side effect of generating so many Strings (esp if you have really long lines of text) not the BufferedReader itself.
TextArea can retain much more memory usage depending on how long the text is.
In any case, the memory usage which really matters is the size of the heap after a Full GC, the rest is overhead of various kinds.
BTW Mb = Megi-bit, MB = Mega-byte.
Related
I have a very large file (~6GB) that has fixed-width text separated by \r\n, and so I'm using buffered reader to read line by line. This process can be interrupted or stopped and if it is, it uses a checkpoint "lastProcessedLineNbr" to fast forward to the correct place to resume reading. This is how the reader is initialized.
private void initializeBufferedReader(Integer lastProcessedLineNbr) throws IOException {
reader = new BufferedReader(new InputStreamReader(getInputStream(), "UTF-8"));
if(lastProcessedLineNbr==null){lastProcessedLineNbr=0;}
for(int i=0; i<lastProcessedLineNbr;i++){
reader.readLine();
}
currentLineNumber = lastProcessedLineNbr;
}
This seems to work fine, and I read and process the data in this method:
public Object readItem() throws Exception {
if((currentLine = reader.readLine())==null){
return null;
}
currentLineNumber++;
return parse(currentLine);
}
And again, everything works fine until I reach the last line in the document. readLine() in the latter method throws an error:
17:06:49,980 ERROR [org.jberet] (Batch Thread - 1) JBERET000007: Failed to run job ProdFileRead, parse, org.jberet.job.model.Chunk#3965dcc8: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
at java.lang.StringBuffer.append(StringBuffer.java:369)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.rational.batch.reader.TextLineReader.readItem(TextLineReader.java:55)
Curiously, it seems to be reading past the end of the file and allocating so much space that it runs out of memory. I tried looking at the contents of the file using Cygwin and "tail file.txt" and in the console it gave me the expected 10 lines. But when I did "tail file.txt > output.txt" output.txt ended up being like 1.8GB, much larger than the 10 lines I expected. So it seems Cygwin is doing the same thing. As far as I can tell there is no special EOF character. It's just the last byte of data and it ends abruptly.
Anyone have any idea on how I can get this working? I'm thinking I could resort to counting the number of bytes read until I get the full size of the file, but I was hoping there was a better way.
But when I did tail file.txt > output.txt output.txt ended up being like 1.8GB, much larger than the 10 lines I expected
What this indicates to me is that the file is padded with 1.8GB of binary zeroes, which Cygwin's tail command ignored when writing to the terminal, but which Java is not ignoring. This would explain your OutOfMemoryError as well, as the BufferedReader continued reading data looking for the next \r\n, never finding it before overflowing memory.
We have an issue unzipping bz2 files in Java, whereby the input stream thinks it's finished after reading ~3% of the file.
We would welcome any suggestions for how to decompress and read large bz2 files which have to be processed line by line.
Here are the details of what we have done so far:
For example, a bz2 file is 2.09 GB in size and uncompressed it is 24.9 GB
The code below only reads 343,800 lines of the actual ~10 million lines the file contains.
Modifying the code to decompress the bz2 into a text file (FileInputStream straight into the CompressorInputStream) results in a file of ~190 MB - irrespective of the size of the bz2 file.
I have tried setting a buffer value of 2048 bytes, but this has no effect on the outcome.
We have executed the code on Windows 64 bit and Linux/CentOS both with the same outcome.
Could the buffered reader come to an empty, "null" line and cause the code to exit the while-loop?
import org.apache.commons.compress.compressors.*;
import java.io.*;
...
CompressorInputStream is = new CompressorStreamFactory()
.createCompressorInputStream(
new BufferedInputStream(
new FileInputStream(filePath)));
lineNumber = 0;
line = "";
br = new BufferedReader(
new InputStreamReader(is));
while ((line = br.readLine()) != null) {
this.processLine(line, ++lineNumber);
}
Even this code, which forces an exception when the end of the stream is reached, has exactly the same result:
byte[] buffer = new byte[1024];
int len = 1;
while (len == 1) {
out.write(buffer, 0, is.read(buffer));
out.flush();
}
There is nothing obviously wrong with your code; it should work. This means the problem must be elsewhere.
Try to enable logging (i.e. print the lines as you process them). Make sure there are no gaps in the input (maybe write the lines to a new file and do a diff). Use bzip2 --test to make sure the input file isn't buggy. Check whether it always fails for the same line (maybe the input contains odd characters or binary data?)
The issue lies with the bz2 files: they were created using a version of Hadoop which includes bad block headers inside the files.
Current Java solutions stumble over this, while others ignore it or handle it somehow.
Will look for a solution/workaround.
I have C++ program (feeder.exe), that prints some data:
printf("%s\n", line);
In average it produces 20-30 lines per second, but not uniformly.
I want to catch this data in Java program by running the exe from Java code:
package temp_read;
import java.io.*;
public class Main {
public static void main(String[] args) throws Throwable {
Process p = Runtime.getRuntime().exec("d:/feeder.exe");
InputStream is = p.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = in.readLine()) != null) {
System.out.println(System.currentTimeMillis() + "," + line);
}
}
}
But when I look into the output, I see that it receives a bulk of strings once per 3-5 seconds.
Question: how to receive the data from feeder.exe immediately without any delay when it prints to stdout?
PS: not related question: How to stop the feeder.exe if I stop the java by Ctrl+C?
If redirected, stdout is probably buffered, meaning that the problem is in the C++ code and not on the Java side. The C++ process will buffer the output and flush several "printf"s at once, as soon as the buffer is full.
If you are able to modify the C++ software, try to do a fflush(stdout); after the printf to force the output buffer to be flushed.
The most likely cause is that the feeder.exe is not flushing its output stream regularly, and the data is sitting in its output buffer until the buffer fills and is written as a result.
If that is the cause, there is nothing you can do on the Java side to avoid this. The problem can only be fixed by modifying the feeder application.
Note that if the data was in the "pipe" that connected the two processes, then reading on the Java side would get it. Assuming that the end-of-line had been written to the pipe, the readLine() call would deliver the line without blocking.
I've a problem which requires me to parse a text file from local machine. There are a few complications:
The files can be quite large (700mb+)
The pattern occurs in multiple lines
I need store line information after the pattern
I've created a simple code using BufferReader, String.indexOf and String.substring (to get item 3).
Inside the file it has a key (pattern) named code= that occurs many times in different blocks. The program read each line from this file using BufferReader.readLine. It uses indexOf to check if the pattern appears and then it extract text after pattern and store in a common string.
When I ran my program with 600mb file, I noticed that performance was worst while it process file. I read an article in CodeRanch that Scanner class isn't performatic for large files.
Are there some techniques or a library that could improve my performance ?
Thanks in advance.
Here's my source code:
String codeC = "code=[";
String source = "";
try {
FileInputStream f1 = new FileInputStream("c:\\Temp\\fo1.txt");
DataInputStream in = new DataInputStream(f1);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
boolean bPrnt = false;
int ln = 0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
if (strLine.indexOf(codeC) != -1) {
ln++;
System.out.println(strLine + " ---- register : " + ln);
strLine = strLine.substring(codeC.length(), strLine.length());
source = source + "\n" + strLine;
}
}
System.out.println("");
System.out.println("Lines :" + ln);
f1.close();
} catch ( ... ) {
...
}
This code of yours is highly suspicious and may well account for at least a part of your performance issues:
FileInputStream f1 = new FileInputStream("c:\\Temp\\fo1.txt");
DataInputStream in = new DataInputStream(f1);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
You are involving DataInputStream for no good reason, and in fact using it as an input to a Reader can be considered a case of broken code. Write this instead:
InputStream f1 = new FileInputStream("c:\\Temp\\fo1.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fr));
A huge detriment to performance is the System.out you are using, especially if you measure the performance when running in Eclipse, but even if running from the command line. My guess is, this is the major cause of your bottleneck. By all means ensure you don't print anything in the main loop when you aim for top performance.
In addition to what Marko answered, I suggest to close the br, not the f1:
br.close()
This will not affect the performance, but is cleaner. (closing the outermost stream)
Have a look at java.util.regex
An excellent tutorial from oracle.
A copy paste from the JAVADoc:
Classes for matching character sequences against patterns specified by regular expressions.
An instance of the Pattern class represents a regular expression that is specified in string form in a syntax similar to that used by Perl.
Instances of the Matcher class are used to match character sequences against a given pattern. Input is provided to matchers via the CharSequence interface in order to support matching against characters from a wide variety of input sources.
Unless otherwise noted, passing a null argument to a method in any class or interface in this package will cause a NullPointerException to be thrown.
It works perfectly !!
I followed OldCurmudgeon, Marko Topolnik and AlexWien advices and my performance improved 1000%. Before the program spent 2 hours to complete described operation and write a response in file.
Now it spends 5 minutes !! And SYSO remains in source code !!
I think that reason of great improvement is change String "source" for HashSet "source" like OldCurmudgeon suggests. Bur I removed DataInputStream and used "br.close" too.
Thanks guys !!
I'm having a very difficult time debugging a problem with an application I've been building. The problem itself I cannot seem to reproduce with a representitive test program with the same issue which makes it difficult to demonstrate. Unfortunately I cannot share my actual source because of security, however, the following test represents fairly well what I am doing, the fact that the files and data are unix style EOL, writing to a zip file with a PrintWriter, and the use of StringBuilders:
public class Tester {
public static void main(String[] args) {
// variables
File target = new File("TESTSAVE.zip");
PrintWriter printout1;
ZipOutputStream zipStream;
ZipEntry ent1;
StringBuilder testtext1 = new StringBuilder();
StringBuilder replacetext = new StringBuilder();
// ensure file replace
if (target.exists()) {
target.delete();
}
try {
// open the streams
zipStream = new ZipOutputStream(new FileOutputStream(target, true));
printout1 = new PrintWriter(zipStream);
ent1 = new ZipEntry("testfile.txt");
zipStream.putNextEntry(ent1);
// construct the data
for (int i = 0; i < 30; i++) {
testtext1.append("Testing 1 2 3 Many! \n");
}
replacetext.append("Testing 4 5 6 LOTS! \n");
replacetext.append("Testing 4 5 6 LOTS! \n");
// the replace operation
testtext1.replace(21, 42, replacetext.toString());
// write it
printout1 = new PrintWriter(zipStream);
printout1.println(testtext1);
// save it
printout1.flush();
zipStream.closeEntry();
printout1.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The heart of the problem is that the file I see at my side is producing a file of 16.3k characters. My friend, whether he uses the app on his pc or whether he looks at exactly the same file as me sees a file of 19.999k characters, the extra characters being a CRLF followed by a massive number of null characters. No matter what application, encoding or views I use, I cannot at all see these nul characters, I only see a single LF at the last line, but I do see a file of 20k. In all cases there is a difference between what is seen with the exact same files on the two machines even though both are windows machines and both are using the same editing softwares to view.
I've not yet been able to reproduce this behaviour with any amount of dummy programs. I have been able to trace the final line's stray CRLF to my use of println on the PrintWriter, however. When I replaced the println(s) with print(s + '\n') the problem appeared to go away (the file size was 16.3k). However, when I returned the program to println(s), the problem does not appear to return. I'm currently having the files verified by a friend in france to see if the problem really did go away (since I cannot see the nuls but he can), but this behaviour has be thoroughly confused.
I've also noticed that the StringBuilder's replace function states "This sequence will be lengthened to accommodate the specified String if necessary". Given that the stringbuilders setLength function pads with nul characters and that the ensureCapacity function sets capacity to the greater of the input or (currentCapacity*2)+2, I suspected a relation somewhere. However, I have only once when testing with this idea been able to get a result that represented what I've seen, and have not been able to reproduce it since.
Does anyone have any idea what could be causing this error or at least have a suggestion on what direction to take the testing?
Edit since the comments section is broken for me:
Just to clarify, the output is required to be in unix format regardless of the OS, hence the use of '\n' directly rather than through a formatter. The original StringBuilder that is inserted into is not in fact generated to me but is the contents of a file read in by the program. I'm happy the reading process works, as the information in it is used heavily throughout the application. I've done a little probing too and found that directly prior to saving, the buffer IS the correct capacity and that the output when toString() is invoked is the correct length (i.e. it contains no null characters and is 16,363 long, not 19,999). This would put the cause of the error somewhere between generating the string and saving the zip file.
Finally found the cause. Managed to reproduce the problem a few times and traced the cause down not to the output side of the code but the input side. My file reading function was essentially this:
char[] buf;
int charcount = 0;
StringBuilder line = new StringBuilder(2048);
InputStreamReader reader = new InputStreamReader(stream);// provides a line-wise read
BufferedReader file = new BufferedReader(reader);
do { // capture loop
try {
buf = new char[2048];
charcount = file.read(buf, 0, 2048);
} catch (IOException e) {
return null; // unknown IO error
}
line.append(buf);
} while (charcount != -1);
// close and output
problem was appending a buffer that wasnt full, so the later values were still at their initial values of null. Reason I couldnt reproduce it was because some data filled in the buffers nicely, some didn't.
Why I couldn't seem to view the problem on my text editors I still have no idea of, but I should be able to resolve this now. Any suggestions on the best way to do so are welcome, as this is part of one of my long term utility libraries I want to keep it as generic and optimised as possible.