Java - GC a large string - java

I have a method to read and parse an extremely long xml file.
The xml file is read into a string, which then is parsed by a different class. However, this causes the Java to use a large amount of memory (~500 MB).
Normally, the program runs at around 30 MB, but when parse() is called, it increases to 500 MB. When parse() is done running, however, the memory usage doesn't go back down to 30 MB; instead it stays at 500 MB.
I've tried setting s = null and calling System.gc() but the memory usage still stays at 500 MB.
public void parse(){
try {
System.out.println("parsing data...");
String path = dir + "/data.xml";
InputStream i = new FileInputStream(path);
BufferedReader reader = new BufferedReader(new InputStreamReader(i));
String line;
String s = "";
while ((line = reader.readLine()) != null){
s += line + "\n";
}
... parse ...
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Any ideas?
Thanks.

Solution for your memory leak
You should Close the BufferReader at the end in order to close the stream and releases any system resources associated with it. You can close both InputStream and BufferReader. However, closing the BufferReader actually closes its stream as well.
Generally it's better to add a finally and close it.
finally
{
i.Close();
reader.Close();
}
Better approach try-with-resources Statement
try (BufferedReader br = new BufferedReader(new FileReader(path)))
{
return br.readLine();
}
Bonus Note
Use a StringBuilder instead of concatenating strings
String does not allow appending. Each append/concatenate on a String creates a new object and returns it. This is because String is immutable - it cannot change its internal state.
On the other hand StringBuilder is mutable. When you call Append, it alters the internal char array, rather than creating a new string object.
Thus it is more memory efficient to use a StringBuilder when you want to append many strings.

Just a note: a try-with-resources block will help you a lot with IO objects like those readers.
try(InputStream i = new FileInputStream(path);
BufferedReader reader = new BufferedReader(new InputStreamReader(i))) {
//your reading here
}
This will make sure these objects are disposed of by calling close() on them, regardless of how your method block exits (success, exception...). Closing these objects may also help to free up some memory.
The thing that's probably causing a big slowdown and probably blowup of memory usage, though, is your string concatenation. Calling s += line + "\n" is fine for a single concatenation, but the + operator actually has to create a new String instance each time, and copy the characters from the ones being concatenated. The StringBuilder class was designed just for this purpose. :)

The 500MB is caused by parsing, so it has nothing to do with the string, or the BufferedReader either. It is the DOM of the parsed XML. Release that and your memory usage will revert.
But why read the entire file into a string? This is a waste of time and space. Just parse the input directly from the file.

You should keep in mind that calling System.gc(); will not definitely do the Garbage collection but it suggest GC to do it's thing and it can ignore doing that if GC dont want to garbage collect. it is better to use StringBuilder do reduce the number of Strings you create in memory because it only creates String when you call toString() on it.

Related

How to use a regular expression to parse a text file and write the result on another file in Java

I used a regular expression to parse a text file to use the resulted group one and two as follows:
write group two in another file
make its name to be group one
Unfortunately, No data is written on the file!
I did not figure out where is the problem, here is my code:
package javaapplication5;
import java.io.*;
import java.util.regex.*;
public class JavaApplication5 {
public static void main(String[] args) {
// TODO code application logic here
try {
FileInputStream fstream = new FileInputStream("C:/Users/Welcome/Desktop/End-End-Delay.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
File newFile1= new File("C:/Users/Welcome/Desktop/AUV1.txt");
FileOutputStream fos1= new FileOutputStream(newFile1);
BufferedWriter bw1= new BufferedWriter(new OutputStreamWriter(fos1));
String strLine;
while ((strLine = br.readLine()) != null) {
Pattern p = Pattern.compile("sender\\sid:\\s(\\d+).*?End-End\\sDelay:(\\d+(?:\\.\\d+)?)");
Matcher m = p.matcher(strLine);
while (m.find()) {
String b = m.group(1);
String c = m.group(2);
int i = Integer.valueOf(b);
if(i==0){
System.out.println(b);
bw1.write(c);
bw1.newLine();
}
System.out.println(b);
// System.out.println(c);
}
}
}
catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
Can anyone here help me to solve this problem and Identify it?
You are using BufferedWriter, and never flush (flushing writer pushes the contents on disk) your writer or even close it at the end of your program.
Due to which, before your content gets written in actual file on disk from BufferedWriter, the program exits and the contents get lost.
To avoid this, either you can call flush just after writing contents in bw1,
bw1.write(c);
bw1.newLine();
bw1.flush();
OR
Before your program ends, you should call,
bw1.close(); // this ensures all content in buffered writer gets push to disk before jvm exists
Calling flush every time you write the data is not really recommended, as it defeats the purpose of buffered writing.
So best is to close the buffered writer object. You can do it in two ways,
Try-with-resources
Manually close the buffered writer object in the end, likely in the finally block so as to ensure it gets called.
Besides all this, you need to ensure that your regex matches and your condition,
if(i==0){
gets executed else code that is writing data in file won't get executed and of course in that case no write will happen in file.
Also, it is strongly recommended to close any of the resources you open like file resources, database (Connection, Statements, ResultSets) resources etc.
Hope that helps.

Converting InputStreamReader into String

Is there a better way to read Strings from an InputStreamReader.
In the Profiler im am getting a memory heap there.
public String getClientMessage() throws IOException {
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(tempSocket.getInputStream()));
char[] buffer = new char[200];
return new String(buffer, 0, bufferedReader.read(buffer));
}
Thanks in advance.
EDIT:
EDIT:
Messages are sent with this:
public void sendServerMessage(String action) throws IOException{
PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(tempSocket.getOutputStream()));
printWriter.print(action);
printWriter.flush();
}
I would suggest you commons-io library for doing such things in a more convenient and simple way.
Just use:
return IOUtils.toString(tempSocket.getInputStream());
But this is only a code-style notice. We don't understand what do you mean by the term getting a memory heap. In any case, if you have insufficient memory troubles, you have to increase the memory for you Java application: Memory Management in the Java
HotSpot™ Virtual Machine:
Java heap space This indicates that an object could not be allocated
in the heap. The issue may be just a configuration problem. You could
get this error, for example, if the maximum heap size specified by the
–Xmx command line option (or selected by default) is insufficient for
the application. It could also be an indication that objects that are
no longer needed cannot be garbage collected because the application
is unintentionally holding references to them. The HAT tool (see
Section 7) can be used to view all reachable objects and understand
which references are keeping each one alive. One other potential
source of this error could be the excessive use of finalizers by the
application such that the thread to invoke the finalizers cannot keep
up with the rate of addition of finalizers to the queue. The jconsole
management tool can be used to monitor the number of objects that are
pending finalization.
You can use IOUtils, but it is easy to write if you can't use that library.
public String getClientMessage() throws IOException {
Reader r = new InputStreamReader(tempSocket.getInputStream());
char[] buffer = new char[4096];
StringBuilder sb = new StringBuilder();
for(int len; (len = r.read(buffer)) > 0;)
sb.append(buffer, 0, len);
return sb.toString();
}
I suspect the problem is you have no way of know from the way you send messages when a message stops. This means you must read until you close the connection which you are not doing. If you don't want to wait until you close you need to add some way of knowing when a message is finished e.g. a newline.
// create this once per socket.
final PrintWriter out = new PrintWriter(
new OutputStreamWriter(tempSocket.getOutputStream(), "UTF-8"), true);
public void sendServerMessage(String action) {
// assuming there is no newlines in the message
printWriter.println(action); // auto flushed.
}
// create this once per socket
BufferedReader in = new BufferedReader(
new InputStreamReader(tempSocket.getInputStream(), "UTF-8"));
public String getClientMessage() throws IOException {
// read until the end of a line, which is the end of a message.
return in.readLine();
}

My Java program which reads a large text file is running out of memory, can anyone help explain why?

I have a large text file with 20 million lines of text. When I read the file using the following program, it works just fine, and in fact I can read much larger files with no memory problems.
public static void main(String[] args) throws IOException {
File tempFile = new File("temp.dat");
String tempLine = null;
BufferedReader br = null;
int lineCount = 0;
try {
br = new BufferedReader(new FileReader(tempFile));
while ((tempLine = br.readLine()) != null) {
lineCount += 1;
}
} catch (Exception e) {
System.out.println("br error: " +e.getMessage());
} finally {
br.close();
System.out.println(lineCount + " lines read from file");
}
}
However if I need to append some records to this file before reading it, the BufferedReader consumes a huge amount of memory (I have just used Windows task manager to monitor this, not very scientific I know but it demonstrates the problem). The amended program is below, which is the same as the first one, except I am appending a single record to the file first.
public static void main(String[] args) throws IOException {
File tempFile = new File("temp.dat");
PrintWriter pw = null;
try {
pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));
pw.println(" ");
} catch (Exception e) {
System.out.println("pw error: " + e.getMessage());
} finally {
pw.close();
}
String tempLine = null;
BufferedReader br = null;
int lineCount = 0;
try {
br = new BufferedReader(new FileReader(tempFile));
while ((tempLine = br.readLine()) != null) {
lineCount += 1;
}
} catch (Exception e) {
System.out.println("br error: " +e.getMessage());
} finally {
br.close();
System.out.println(lineCount + " lines read from file");
}
}
A screenshot of Windows task manager, where the large bump in the line shows the memory consumption when I run the second version of the program.
So I was able to read this file without running out of memory. But I have much larger files with more than 50 million records, which encounter an out of memory exception when I run this program against them? Can someone explain why the first version of the program works fine on files of any size, but the second program behaves so differently and ends in failure? I am running on Windows 7 with:
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) Client VM (build 23.1-b03, mixed mode, sharing)
you can start a Java-VM with VM-Options
-XX:+HeapDumpOnOutOfMemoryError
this will write a heap dump to a file, which can be analysed for finding leak suspects
Use a '+' to add an option and a '-' to remove an option.
If you are using Eclipse the Java Memory Analyzer Plugin MAT to get Heap-Dumps from running VMs with some nice analyses for Leak Suspects etc.
Each time you execute the java following Java routine, you are creating a brand new object:
tempLine = br.readLine()
I believe each time you call readLine() it is probably creating a new String object which is left on the heap each time the re-assignment is called to assign the value to tempLine.
Therefore, since GC isn't constantly being called thousands of objects can be left on the heap within seconds.
Some people say its a bad idea to call System.gc() every 1000 lines or so but I would be curious if that fixes your issue. Also, you could run this command after each line to basically mark each object as garbage collectable:
tempLine=null;
pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));
did you try not using a BufferedWriter? If your appending a few lines to the end maybe you don't need a buffer? If you do, consider using a byte array (collections or String builder). Finally did you try the same in java 1.6_32? Might be a bug in the new version of one of the Writers.
Can you print the free memory after before and after pw.close(); ?
System.out.println("before wr close :" + Runtime.getRuntime().freeMemory());
and similar after close and after reader close
It could be because you may not be having linefeed/carriage return in your file at all. In this case, readLine() tries to create just one single string out of your file which is probably running out of mememory.
Java doc of readLine():
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Have you tried:
A) creating a new File instance to use for the reading, but pointing to the same file.
and
B) reading an entirely different file in the second part.
I'm wondering if either, the File object is still somehow attached to the PrintWriter or if the OS is doing something funny with the file handles. Those tests should show you where to focus.
This doesn't look to be a problem with the code, and your logic for thinking it shouldn't break seems sound, so it's got to be some underlying functionality.
you'll need to start java with a bigger heap. Try -Xmx1024m as a parameter on the java command.
Basically your going to need more memory than the size of the file.

Filereader null declarations and appending best practice

I want to optimise my file reader function but am not sure if it is best practice to declare the nulls outside of the try loop. Also, is looping and appending chars to a Stringbuffer considered bad practice? I would like to use the exception handling here, but maybe it is better to use another structure? any advice most welcome, thanks.
public String readFile(){
File f = null;
FileReader fr = null;
StringBuffer content = null;
try{
f = new File("c:/test.txt");
fr = new FileReader(f);
int c;
while((c = fr.read()) != -1){
if(content == null){
content = new StringBuffer();
}
content.append((char)c);
}
fr.close();
}
catch (Exception e) {
throw new RuntimeException("An error occured reading your file");
}
return content.toString();
}
}
Advice:
Indent your code properly. The stuff in your question looks like a dog's breakfast.
You don't need to initialize f inside the try / catch block. The constructor can't throw an Exception the way you are using it.
In fact, you don't need to declare it at all. Just inline the new File(...).
In fact, you don't even need to do that. Use the FileReader(String) constructor.
There's no point initializing the StringBuffer inside the loop. The potential performance benefit is small and only applies in the edge-case where the file is empty or doesn't exist. In all other cases, this is an anti-optimization.
Don't catch Exception. Catch the exceptions that you expect to be thrown and allow all other exceptions to propagate. The unexpected exceptions are going to be due to bugs in your program, and need to be handled differently from others.
When you catch an exception, don't throw away the evidence. For an unexpected exception, either print / log the exception, its message and its stacktrace, or pass it as the 'cause' of the exception that you throw.
The FileReader should be closed in a finally clause. In your version of the code, the FileReader won't be closed if there is an exception after the object has been created and before the close() call. That will result in a leaked file descriptor and could cause problems later in your application.
Better yet, use the new Java 7 "try with resource" syntax which takes care of closing the "resource" automatically (see below).
You are reading from the file one character at a time. This is very inefficient. You need to either wrap the Reader in a BufferedReader, or read a large number of characters at a time using (for example) read(char[], int, int)
Use StringBuilder rather than StringBuffer ... unless you need a thread-safe string assembler.
Wrapping exceptions in RuntimeException is bad practice. It makes it difficult for the caller to handle specific exceptions ... if it needs to ... and even makes printing of a decent diagnostic more difficult. (And that assumes that you didn't throw away the original exception like your code does.)
Note: if you follow the advice of point 8 and not 9, you will find that the initialization of fr to null is necessary if you open the file in the try block.
Here's how I'd write this:
public String readFile() throws IOException {
// Using the Java 7 "try with resource syntax".
try (FileReader fr = new FileReader("c:/test.txt")) {
BufferedReader br = new BufferedReader(fr);
StringBuilder content = new StringBuilder();
int c;
while ((c = br.read()) != -1) {
content.append((char)c);
}
return content.toString();
}
}
A further optimization would be to use File.length() to find out what the file size (in bytes) is and use that as the initial size of the StringBuilder. However, if the files are typically small this is likely to make the application slower.
public String readFile() {
File f = new File("/Users/Guest/Documents/workspace/Project/src/test.txt");
FileReader fr = null;
BufferedReader br = null;
StringBuilder content = new StringBuilder();;
try {
fr = new FileReader(f);
br = new BufferedReader(fr);
//int c;
//while ((c = fr.read()) != -1) {
//content.append((char) c);
//}
String line = null;
while((line = br.readLine()) != null) {
content.append(line);
}
fr.close();
br.close();
} catch (Exception e) {
// do something
}
return content.toString();
}
Use buffered reader and youll get 70%+ improvement, use string builder instead of string buffer unless you need syncronization.
ran it on a 10MB file 50 times and averaged
no need to put anything that does not need exception handling inside try
no need for that if clause because it will be true only once and so you're wasting time - checking it for every character
there is no runtime exceptions to throw.
results:
fastest combination to slowest:
string builder and buffered reader line by line: 211 ms
string buffer and buffered reader line by line: 213 ms
string builder and buffered reader char by char: 348 ms
string buffer and buffered reader char by char: 372 ms
string builder and file reader char by char: 878
string buffer and file reader char by char: 935 ms
string: extremely slow
so use string builder + buffered reader and make it read line by line for best results.

Read multiple lines from InputStreamReader (JAVA)

I have an InputStreamReader object. I want to read multiple lines into a buffer/array using one function call (without crating a mass of string objects). Is there a simple way to do so?
First of all mind that InputStreamReader is not so efficient, you should wrap it around a BufferedReader object for maximum performance.
Taken into account this you can do something like this:
public String readLines(InputStreamReader in)
{
BufferedReader br = new BufferedReader(in);
// you should estimate buffer size
StringBuffer sb = new StringBuffer(5000);
try
{
int linesPerRead = 100;
for (int i = 0; i < linesPerRead; ++i)
{
sb.append(br.readLine());
// placing newlines back because readLine() removes them
sb.append('\n');
}
}
catch (Exception e)
{
e.printStackTrace();
}
return sb.toString();
}
Mind that readLine() returns null is EOF is reached, so you should check and take care of it.
If you have some delimiter for multiple lines you can read that many characters using read method with length and offset. Otherwise using a StringBuilder for appending each line read by BufferedReader should work well for you without eating up too much temp memory

Categories

Resources