Converting InputStreamReader into String

Converting InputStreamReader into String - java

Is there a better way to read Strings from an InputStreamReader.
In the Profiler im am getting a memory heap there.
public String getClientMessage() throws IOException {
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(tempSocket.getInputStream()));
char[] buffer = new char[200];
return new String(buffer, 0, bufferedReader.read(buffer));
}
Thanks in advance.
EDIT:
EDIT:
Messages are sent with this:
public void sendServerMessage(String action) throws IOException{
PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(tempSocket.getOutputStream()));
printWriter.print(action);
printWriter.flush();
}

I would suggest you commons-io library for doing such things in a more convenient and simple way.
Just use:
return IOUtils.toString(tempSocket.getInputStream());
But this is only a code-style notice. We don't understand what do you mean by the term getting a memory heap. In any case, if you have insufficient memory troubles, you have to increase the memory for you Java application: Memory Management in the Java
HotSpot™ Virtual Machine:
Java heap space This indicates that an object could not be allocated
in the heap. The issue may be just a configuration problem. You could
get this error, for example, if the maximum heap size specified by the
–Xmx command line option (or selected by default) is insufficient for
the application. It could also be an indication that objects that are
no longer needed cannot be garbage collected because the application
is unintentionally holding references to them. The HAT tool (see
Section 7) can be used to view all reachable objects and understand
which references are keeping each one alive. One other potential
source of this error could be the excessive use of finalizers by the
application such that the thread to invoke the finalizers cannot keep
up with the rate of addition of finalizers to the queue. The jconsole
management tool can be used to monitor the number of objects that are
pending finalization.

You can use IOUtils, but it is easy to write if you can't use that library.
public String getClientMessage() throws IOException {
Reader r = new InputStreamReader(tempSocket.getInputStream());
char[] buffer = new char[4096];
StringBuilder sb = new StringBuilder();
for(int len; (len = r.read(buffer)) > 0;)
sb.append(buffer, 0, len);
return sb.toString();
}
I suspect the problem is you have no way of know from the way you send messages when a message stops. This means you must read until you close the connection which you are not doing. If you don't want to wait until you close you need to add some way of knowing when a message is finished e.g. a newline.
// create this once per socket.
final PrintWriter out = new PrintWriter(
new OutputStreamWriter(tempSocket.getOutputStream(), "UTF-8"), true);
public void sendServerMessage(String action) {
// assuming there is no newlines in the message
printWriter.println(action); // auto flushed.
}
// create this once per socket
BufferedReader in = new BufferedReader(
new InputStreamReader(tempSocket.getInputStream(), "UTF-8"));
public String getClientMessage() throws IOException {
// read until the end of a line, which is the end of a message.
return in.readLine();
}

Related

Java - GC a large string

I have a method to read and parse an extremely long xml file.
The xml file is read into a string, which then is parsed by a different class. However, this causes the Java to use a large amount of memory (~500 MB).
Normally, the program runs at around 30 MB, but when parse() is called, it increases to 500 MB. When parse() is done running, however, the memory usage doesn't go back down to 30 MB; instead it stays at 500 MB.
I've tried setting s = null and calling System.gc() but the memory usage still stays at 500 MB.
public void parse(){
try {
System.out.println("parsing data...");
String path = dir + "/data.xml";
InputStream i = new FileInputStream(path);
BufferedReader reader = new BufferedReader(new InputStreamReader(i));
String line;
String s = "";
while ((line = reader.readLine()) != null){
s += line + "\n";
}
... parse ...
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Any ideas?
Thanks.

Solution for your memory leak
You should Close the BufferReader at the end in order to close the stream and releases any system resources associated with it. You can close both InputStream and BufferReader. However, closing the BufferReader actually closes its stream as well.
Generally it's better to add a finally and close it.
finally
{
i.Close();
reader.Close();
}
Better approach try-with-resources Statement
try (BufferedReader br = new BufferedReader(new FileReader(path)))
{
return br.readLine();
}
Bonus Note
Use a StringBuilder instead of concatenating strings
String does not allow appending. Each append/concatenate on a String creates a new object and returns it. This is because String is immutable - it cannot change its internal state.
On the other hand StringBuilder is mutable. When you call Append, it alters the internal char array, rather than creating a new string object.
Thus it is more memory efficient to use a StringBuilder when you want to append many strings.

Just a note: a try-with-resources block will help you a lot with IO objects like those readers.
try(InputStream i = new FileInputStream(path);
BufferedReader reader = new BufferedReader(new InputStreamReader(i))) {
//your reading here
}
This will make sure these objects are disposed of by calling close() on them, regardless of how your method block exits (success, exception...). Closing these objects may also help to free up some memory.
The thing that's probably causing a big slowdown and probably blowup of memory usage, though, is your string concatenation. Calling s += line + "\n" is fine for a single concatenation, but the + operator actually has to create a new String instance each time, and copy the characters from the ones being concatenated. The StringBuilder class was designed just for this purpose. :)

The 500MB is caused by parsing, so it has nothing to do with the string, or the BufferedReader either. It is the DOM of the parsed XML. Release that and your memory usage will revert.
But why read the entire file into a string? This is a waste of time and space. Just parse the input directly from the file.

You should keep in mind that calling System.gc(); will not definitely do the Garbage collection but it suggest GC to do it's thing and it can ignore doing that if GC dont want to garbage collect. it is better to use StringBuilder do reduce the number of Strings you create in memory because it only creates String when you call toString() on it.

Storing file contents to StringBuilder apparently takes too much memory

I have this code:
public static void main(String[] args) {
System.out.println("Reading file...");
String content = readFile(args[0]);
System.out.println("Done reading file.");
}
private static String readFile(String file) throws IOException {
BufferedReader reader = new BufferedReader( new FileReader (file));
String line = null;
StringBuilder stringBuilder = new StringBuilder();
while( ( line = reader.readLine() ) != null ) {
stringBuilder.append( line );
}
return stringBuilder.toString();
}
The readFile method works fine, well, for small files.
The thing I noticed is that it takes too much memory.
If I open the System Monitor on windows (CTRL-SHIFT-ESC), I see the java process taking up to 1,8GB RAM, while the size of my file is just 550MB.
Yes, I know, loading a file entirely into memory isn't a good idea, I'm doing this just for curiosity.
The program gets stuck at Reading file... when the newly created java process starts, it takes a bunch of MB of RAM and goes up to 1,8GB.
I also tried using String concatenation instead of using StringBuilder, but I have the exact same result.
Why does it take so much memory? Is the final stringBuilder.toString causing this?

You have to remember how these libraries work.
One byte on disk can turn into 2 byte char. The StringBuilder grows by doubling in capacity so it can be up to twice as large as you really need, and you need both the StringBuilder and String in memory at the same time.
So take your example. 550 MB can turn into 1100 MB as char alone. However, the size doubles in size so the it will be approximately the next power of two i.e. it could be 2 GB, and this is on top of a the String which would be 550 MB.
Note: the reason it is not using this much memory is that you have a bug. You are discarding all the new lines \r\n which means you have less characters.
When processing a large file where you don't have enough memory to load it into memory at once, you are better off processing the data as your read it.
BTW If you have plenty of memory, you can read the file faster, with less memory this way.
static String readFile(String file) throws IOException {
try(FileInputStream fis = new FileInputStream(file)) {
byte[] bytes = new byte[(int) fis.available()];
fis.read(bytes);
return new String(bytes);
}
}

My Java program which reads a large text file is running out of memory, can anyone help explain why?

I have a large text file with 20 million lines of text. When I read the file using the following program, it works just fine, and in fact I can read much larger files with no memory problems.
public static void main(String[] args) throws IOException {
File tempFile = new File("temp.dat");
String tempLine = null;
BufferedReader br = null;
int lineCount = 0;
try {
br = new BufferedReader(new FileReader(tempFile));
while ((tempLine = br.readLine()) != null) {
lineCount += 1;
}
} catch (Exception e) {
System.out.println("br error: " +e.getMessage());
} finally {
br.close();
System.out.println(lineCount + " lines read from file");
}
}
However if I need to append some records to this file before reading it, the BufferedReader consumes a huge amount of memory (I have just used Windows task manager to monitor this, not very scientific I know but it demonstrates the problem). The amended program is below, which is the same as the first one, except I am appending a single record to the file first.
public static void main(String[] args) throws IOException {
File tempFile = new File("temp.dat");
PrintWriter pw = null;
try {
pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));
pw.println(" ");
} catch (Exception e) {
System.out.println("pw error: " + e.getMessage());
} finally {
pw.close();
}
String tempLine = null;
BufferedReader br = null;
int lineCount = 0;
try {
br = new BufferedReader(new FileReader(tempFile));
while ((tempLine = br.readLine()) != null) {
lineCount += 1;
}
} catch (Exception e) {
System.out.println("br error: " +e.getMessage());
} finally {
br.close();
System.out.println(lineCount + " lines read from file");
}
}
A screenshot of Windows task manager, where the large bump in the line shows the memory consumption when I run the second version of the program.
So I was able to read this file without running out of memory. But I have much larger files with more than 50 million records, which encounter an out of memory exception when I run this program against them? Can someone explain why the first version of the program works fine on files of any size, but the second program behaves so differently and ends in failure? I am running on Windows 7 with:
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) Client VM (build 23.1-b03, mixed mode, sharing)

you can start a Java-VM with VM-Options
-XX:+HeapDumpOnOutOfMemoryError
this will write a heap dump to a file, which can be analysed for finding leak suspects
Use a '+' to add an option and a '-' to remove an option.
If you are using Eclipse the Java Memory Analyzer Plugin MAT to get Heap-Dumps from running VMs with some nice analyses for Leak Suspects etc.

Each time you execute the java following Java routine, you are creating a brand new object:
tempLine = br.readLine()
I believe each time you call readLine() it is probably creating a new String object which is left on the heap each time the re-assignment is called to assign the value to tempLine.
Therefore, since GC isn't constantly being called thousands of objects can be left on the heap within seconds.
Some people say its a bad idea to call System.gc() every 1000 lines or so but I would be curious if that fixes your issue. Also, you could run this command after each line to basically mark each object as garbage collectable:
tempLine=null;

pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true)));
did you try not using a BufferedWriter? If your appending a few lines to the end maybe you don't need a buffer? If you do, consider using a byte array (collections or String builder). Finally did you try the same in java 1.6_32? Might be a bug in the new version of one of the Writers.
Can you print the free memory after before and after pw.close(); ?
System.out.println("before wr close :" + Runtime.getRuntime().freeMemory());
and similar after close and after reader close

It could be because you may not be having linefeed/carriage return in your file at all. In this case, readLine() tries to create just one single string out of your file which is probably running out of mememory.
Java doc of readLine():
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

Have you tried:
A) creating a new File instance to use for the reading, but pointing to the same file.
and
B) reading an entirely different file in the second part.
I'm wondering if either, the File object is still somehow attached to the PrintWriter or if the OS is doing something funny with the file handles. Those tests should show you where to focus.
This doesn't look to be a problem with the code, and your logic for thinking it shouldn't break seems sound, so it's got to be some underlying functionality.

you'll need to start java with a bigger heap. Try -Xmx1024m as a parameter on the java command.
Basically your going to need more memory than the size of the file.

Java: reading strings from a random access file with buffered input

I've never had close experiences with Java IO API before and I'm really frustrated now. I find it hard to believe how strange and complex it is and how hard it could be to do a simple task.
My task: I have 2 positions (starting byte, ending byte), pos1 and pos2. I need to read lines between these two bytes (including the starting one, not including the ending one) and use them as UTF8 String objects.
For example, in most script languages it would be a very simple 1-2-3-liner like that (in Ruby, but it will be essentially the same for Python, Perl, etc):
f = File.open("file.txt").seek(pos1)
while f.pos < pos2 {
s = f.readline
# do something with "s" here
}
It quickly comes hell with Java IO APIs ;) In fact, I see two ways to read lines (ending with \n) from regular local files:
RandomAccessFile has getFilePointer() and seek(long pos), but it's readLine() reads non-UTF8 strings (and even not byte arrays), but very strange strings with broken encoding, and it has no buffering (which probably means that every read*() call would be translated into single undelying OS read() => fairly slow).
BufferedReader has great readLine() method, and it can even do some seeking with skip(long n), but it has no way to determine even number of bytes that has been already read, not mentioning the current position in a file.
I've tried to use something like:
FileInputStream fis = new FileInputStream(fileName);
FileChannel fc = fis.getChannel();
BufferedReader br = new BufferedReader(
new InputStreamReader(
fis,
CHARSET_UTF8
)
);
... and then using fc.position() to get current file reading position and fc.position(newPosition) to set one, but it doesn't seem to work in my case: looks like it returns position of a buffer pre-filling done by BufferedReader, or something like that - these counters seem to be rounded up in 16K increments.
Do I really have to implement it all by myself, i.e. a file readering interface which would:
allow me to get/set position in a file
buffer file reading operations
allow reading UTF8 strings (or at least allow operations like "read everything till the next \n")
Is there a quicker way than implementing it all myself? Am I overseeing something?

import org.apache.commons.io.input.BoundedInputStream
FileInputStream file = new FileInputStream(filename);
file.skip(pos1);
BufferedReader br = new BufferedReader(
new InputStreamReader(new BoundedInputStream(file,pos2-pos1))
);
If you didn't care about pos2, then you woundn't need Apache Commons IO.

I wrote this code to read utf-8 using randomaccessfiles
//File: CyclicBuffer.java
public class CyclicBuffer {
private static final int size = 3;
private FileChannel channel;
private ByteBuffer buffer = ByteBuffer.allocate(size);
public CyclicBuffer(FileChannel channel) {
this.channel = channel;
}
private int read() throws IOException {
return channel.read(buffer);
}
/**
* Returns the byte read
*
* #return byte read -1 - end of file reached
* #throws IOException
*/
public byte get() throws IOException {
if (buffer.hasRemaining()) {
return buffer.get();
} else {
buffer.clear();
int eof = read();
if (eof == -1) {
return (byte) eof;
}
buffer.flip();
return buffer.get();
}
}
}
//File: UTFRandomFileLineReader.java
public class UTFRandomFileLineReader {
private final Charset charset = Charset.forName("utf-8");
private CyclicBuffer buffer;
private ByteBuffer temp = ByteBuffer.allocate(4096);
private boolean eof = false;
public UTFRandomFileLineReader(FileChannel channel) {
this.buffer = new CyclicBuffer(channel);
}
public String readLine() throws IOException {
if (eof) {
return null;
}
byte x = 0;
temp.clear();
while ((byte) -1 != (x = (buffer.get())) && x != '\n') {
if (temp.position() == temp.capacity()) {
temp = addCapacity(temp);
}
temp.put(x);
}
if (x == -1) {
eof = true;
}
temp.flip();
if (temp.hasRemaining()) {
return charset.decode(temp).toString();
} else {
return null;
}
}
private ByteBuffer addCapacity(ByteBuffer temp) {
ByteBuffer t = ByteBuffer.allocate(temp.capacity() + 1024);
temp.flip();
t.put(temp);
return t;
}
public static void main(String[] args) throws IOException {
RandomAccessFile file = new RandomAccessFile("/Users/sachins/utf8.txt",
"r");
UTFRandomFileLineReader reader = new UTFRandomFileLineReader(file
.getChannel());
int i = 1;
while (true) {
String s = reader.readLine();
if (s == null)
break;
System.out.println("\n line " + i++);
s = s + "\n";
for (byte b : s.getBytes(Charset.forName("utf-8"))) {
System.out.printf("%x", b);
}
System.out.printf("\n");
}
}
}

For #Ken Bloom A very quick go at a Java 7 version. Note: I don't think this is the most efficient way, I'm still getting my head around NIO.2, Oracle has started their tutorial here
Also note that this isn't using Java 7's new ARM syntax (which takes care of the Exception handling for file based resources), it wasn't working in the latest openJDK build that I have. But if people want to see the syntax, let me know.
/*
* Paths uses the default file system, note no exception thrown at this stage if
* file is missing
*/
Path file = Paths.get("C:/Projects/timesheet.txt");
ByteBuffer readBuffer = ByteBuffer.allocate(readBufferSize);
FileChannel fc = null;
try
{
/*
* newByteChannel is a SeekableByteChannel - this is the fun new construct that
* supports asynch file based I/O, e.g. If you declared an AsynchronousFileChannel
* you could read and write to that channel simultaneously with multiple threads.
*/
fc = (FileChannel)file.newByteChannel(StandardOpenOption.READ);
fc.position(startPosition);
while (fc.read(readBuffer) != -1)
{
readBuffer.rewind();
System.out.println(Charset.forName(encoding).decode(readBuffer));
readBuffer.flip();
}
}

Start with a RandomAccessFile and use read or readFully to get a byte array between pos1 and pos2. Let's say that we've stored the data read in a variable named rawBytes.
Then create your BufferedReader using
new BufferedReader(new InputStreamReader(new ByteArrayInputStream(rawBytes)))
Then you can call readLine on the BufferedReader.
Caveat: this probably uses more memory than if you could make the BufferedReader seek to the right location itself, because it preloads everything into memory.

I think the confusion is caused by the UTF-8 encoding and the possibility of double byte characters.
UTF8 doesn't specify how many bytes are in a single character. I'm assuming from your post that you are using single byte characters. For example, 412 bytes would mean 411 characters. But if the string were using double byte characters, you would get the 206 character.
The original java.io package didn't deal well with this multi-byte confusion. So, they added more classes to deal specifically with strings. The package mixes two different types of file handlers (and they can be confusing until the nomenclature is sorted out). The stream classes provide for direct data I/O without any conversion. The reader classes convert files to strings with full support for multi-byte characters. That might help clarify part of the problem.
Since you state you are using UTF-8 characters, you want the reader classes. In this case, I suggest FileReader. The skip() method in FileReader allows you to pass by X characters and then start reading text. Alternatively, I prefer the overloaded read() method since it allows you to grab all the text at one time.
If you assume your "bytes" are individual characters, try something like this:
FileReader fr = new FileReader( new File("x.txt") );
char[] buffer = new char[ pos2 - pos ];
fr.read( buffer, pos, buffer.length );
...

I'm late to the party here, but I ran across this problem in my own project.
After much traversal of Javadocs and Stack Overflow, I think I found a simple solution.
After seeking to the appropriate place in your RandomAccessFile, which I am here calling raFile, do the following:
FileDescriptor fd = raFile.getFD();
FileReader fr = new FileReader(fd);
BufferedReader br = new BufferedReader(fr);
Then you should be able to call br.readLine() to your heart's content, which will be much faster than calling raFile.readLine().
The one thing I'm not sure about is whether UTF8 strings are handled correctly.

The java IO API is very flexible. Unfortunately sometimes the flexibility makes it verbose. The main idea here is that there are many streams, writers and readers that implement wrapper patter. For example BufferedInputStream wraps any other InputStream. The same is about output streams.
The difference between streams and readers/writers is that streams work with bytes while readers/writers work with characters.
Fortunately some streams, writers and readers have convenient constructors that simplify coding. If you want to read file you just have to say
InputStream in = new FileInputStream("/usr/home/me/myfile.txt");
if (in.markSupported()) {
in.skip(1024);
in.read();
}
It is not so complicated as you afraid.
Channels is something different. It is a part of so called "new IO" or nio. New IO is not blocked - it is its main advantage. You can search in internet for any "nio java tutorial" and read about it. But it is more complicated than regular IO and is not needed for most applications.

A simple program to demonstrate memory leak in Java

I am looking for a simple program that can demonstrate memory leak in Java.
Thanks.

http://www.codeproject.com/KB/books/EffectiveJava.aspx
See Item 6.

Memory leak are for example if you have references that are not necessary any more but can't get catched by the garbage collector.
There are simple examples e.g. from IBM that shows the principle:
http://www.ibm.com/developerworks/rational/library/05/0816_GuptaPalanki/

Vector v = new Vector();
while (true)
{
byte b[] = new byte[1048576];
v.add(b);
}
This will continuously add a 1MB byte to a vector until it runs out of memory

A great example from a great book: http://www.informit.com/articles/article.aspx?p=1216151&seqNum=6

Let's first defined what is a memory leak in Java context - it's a situation when a program can mistakenly hold a reference to an object that is never again used during the rest of the program's run.
An example to it, would be forgetting to close an opened stream:
class MemoryLeak {
private void startLeaking() throws IOException {
StringBuilder input = new StringBuilder();
URLConnection conn = new URL("www.example.com/file.txt").openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8));
while (br.readLine() != null) {
input.append(br.readLine());
}
}
public static void main(String[] args) throws IOException {
MemoryLeak ml = new MemoryLeak();
ml.startLeaking();
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Converting InputStreamReader into String - java

Related

Java - GC a large string

Storing file contents to StringBuilder apparently takes too much memory

My Java program which reads a large text file is running out of memory, can anyone help explain why?

Java: reading strings from a random access file with buffered input

A simple program to demonstrate memory leak in Java

Categories

Resources