Buffer and File in Java

Buffer and File in Java - java

I'm new to java and I want to ask what's the difference between using FileReader-FileWriter and using BufferedReader-BufferedWriter. Except of speed is there any other reason to use Buffered?
In a code for copying a file and pasting its content into another file is it better to use BufferedReader and BufferedWriter?

The short version is: File writer/reader is fast but inefficient, but a buffered writer/reader saves up writes/reads and does them in chunks (Based on the buffer size) which is far more efficient but can be slower (waiting for the buffer to fill up).
So to answer your question, a buffered writer/reader is generally best, especially if you are not sure on which one to use.
Take a look at the JavaDoc for the BufferedWriter, it does a great job of explaining how it works:
In general, a Writer sends its output immediately to the underlying
character or byte stream. Unless prompt output is required, it is
advisable to wrap a BufferedWriter around any Writer whose write()
operations may be costly, such as FileWriters and OutputStreamWriters.
For example,
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("foo.out")));
will buffer the PrintWriter's output to the file. Without buffering,
each invocation of a print() method would cause characters to be
converted into bytes that would then be written immediately to the
file, which can be very inefficient.

Related

BufferedReader with FileReader working

I have a doubt regarding how BufferedReader works with FileReader. Studied most of the posts on stackoverflow and Google as well but still my doubt is not cleared. Its my third day am putting on this to understand..! :)
Here it is:
My Understanding says, when we use below code snippet
BufferedReader in
= new BufferedReader(new FileReader("foo.in"));
FileReader reads bytes wise data and put into buffer. Here buffer is created by BufferedReader and the instance of BufferedReader reads from that buffer.
This made me think, because this post says Understanding how BufferedReader works in Java, BufferedReader doesnt store anything itself, because if that's a case then I thought BufferedReader doing two things, one creates a buffer and second creates a instance of BufferedReader who reads from that buffer...! Makes Sense...?
My Second doubt is, BufferedReader can be used to avoid IO operations, which means to avoid time consuming efforts where bytes are being read from disk and then converted to Char then giving out. So to overcome this issue, BufferedReader can be used who reads big chunk of data at once. Here makes me think that, when BufferedReader is wrapped around FileReader then FileReader stream is reading first and then data is being passed to BufferedReader. Then how it takes a big chunk...?
My understanding says, BufferedReader reader is helpful because it reads data from Buffer which is a memory, so rather than doing same thing at time which is reading bytes from disk and converting at the same time, first put all bytes in buffer or memory then read it from there, because its fast to be read and can be converted to char as well. This I have concluded by reading online, but am not agree 100% because no step is skipped even after putting into buffer, then how it reduce the time frame....? :(
I'm literally confused with these, Can anyone help me to understand this more precisely ?

FileReader reads bytes wise data
No. It constructs a FileInputStream and Input Reader, and reads from the latter, as characters.
and put into buffer
Puts into the caller's buffer.
Here buffer is created by BufferedReader and the instance of BufferedReader reads from that buffer.
Correct.
This made me think, because this post says Understanding how BufferedReader works in Java, BufferedReader doesnt store anything itself
That statement in that post is complete and utter nonsense, and so is any other source that says so. Of course it stores data. It is a buffer. See the Javadoc, and specifically the following statement: 'reads text from a character-input stream, buffering characters [my emphasis] so as to provide for the efficient reading of characters, arrays, and lines.'
because if that's a case then I thought BufferedReader doing two things, one creates a buffer and second creates a instance of BufferedReader who reads from that buffer...! Makes Sense...?
No, but neither did your source. Your first intuition above was correct.
My Second doubt is, BufferedReader can be used to avoid IO operations, which means to avoid time consuming efforts where bytes are being read from disk and then converted to Char then giving out. So to overcome this issue, BufferedReader can be used who reads big chunk of data at once. Here makes me think that, when BufferedReader is wrapped around FileReader then FileReader stream is reading first and then data is being passed to BufferedReader. Then how it takes a big chunk...?
By supplying a big buffer to FileReader.read().
My understanding says, BufferedReader reader is helpful because it reads data from Buffer which is a memory, so rather than doing same thing at time which is reading bytes from disk and converting at the same time, first put all bytes in buffer or memory then read it from there, because its fast to be read and can be converted to char as well. This I have concluded by reading online, but am not agree 100% because no step is skipped even after putting into buffer, then how it reduce the time frame....? :(
The step of reading character by character from the disk is skipped. It is more or less just as efficient to read a chunk from a disk file as it is to read one byte, and system calls are themselves expensive.

Advantages of Double BufferedWriter or BufferedReader

I know that a BufferedWriter or BufferedReader cannot directly communicate with a file. It needs to wrap another Writer object to do it. Like,
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter("abc.txt"));
Here we are simply wrapping a FileWriter object using a BufferedWriter for IO performance advantages.
But I could also do this,
BufferedWriter bufferedWriter = new BufferedWriter(new BufferedWriter(new FileWriter("abc.txt")));
Here the FileWrite object is wrapped using a BufferedWriter which in turn is wrapped using another BufferedWriter. Or a more evil idea would be to chain it even further.
Is there any real advantage of double BufferedWriter? Or chaining it even further? The same is applicable for BufferedReader too.

There's no benefit, no.
First, you have to understand what the buffering is for. When you write to disk, the hard drive needs to physically move the disk head to the right place, then wait for the disk to spin such that it's in the right place, and then start writing bytes as the disk spins under the head. Those first two steps are much slower than the rest of the operation, relatively speaking. This means that there's a lot of fixed overhead: writing 1000 bytes is much faster than writing 1 byte 1000 times.
So, buffering is just a way of having the application write byte in a way that's easy for the application's logic — one byte at a time, three bytes, 1000 bytes, whatever — while still getting disk performance. Most write operations to the buffer don't actually cause any bytes to go to the underlying output stream; only once you hit a certain limit (say, every 1000 bytes) is everything written, all at once.
And it's the same idea on input.
So, chaining these wouldn't help. With the chain, assuming they had equal buffer sizes, you would write to the "outer" buffer, and it wouldn't write to the "inner" buffer at all... and then when it hits its limit, it would flush all of those bytes to the inner buffer. That inner buffer instantly hits its buffer limit (since they're the same limit) and flushes those bytes right to the output. You haven't had any benefits, but you did have to copy the bytes an extra time in memory (to the byte buffer).

"Buffered" here is primarily reflecting the semantics of the interface (API). Noting this, composing IO pipelines via chaining of BufferedReader is a possibility. In general, consider that consumption of a single byte at the end of the chain may involve multiple reads at the head and could, in theory and per API, simply be a computation based on data read at the head.
For the general case of block device buffering (e.g. reading from an IO device with block sized data transfer, such as FS or net endpoints), chaining buffers (effectively queues) certainly will increase memory consumption, immediately add latency to processing (due to the increased buffer size, in total). It typically will significantly increase throughput (with noted negative impact on latency).

Where is the data queued with a BufferedReader?

I am reading a large csv from a web service Like this:
br = new BufferedReader(new InputStreamReader(website.openStream(), "UTF-16"));
I read line by line and write into a database. The writing into a database is the bottleneck of this operation and I am wondering if it is possible that I will "timeout" with the webservice so I get the condition where the webservice just cuts the connection because I am not reading anything from it...
Or does the BufferedReader just buffer the stream into memory until I read from it?

yes, it is possible that the webservice stream will timeout while you are writing to the db. If the db is really slow enough that this might timeout, then you may need to copy the file locally before pushing it into the db.

+1 for Brian's answer.
Furthermore, I would recommend you have a look at my csv-db-tools on GitHub. The csv-db-importer module illustrates how to import large CSV files into the database. The code is optimized to insert one row at a time and keep the memory free from data buffered from large CSV files.

BufferedReader will, as you have speculated, read the contents of the stream into memory. Any calls to read or readLine will read data from the buffer, not from the original stream, assuming the data is already available in the buffer. The advantage here is that data is read in larger batches, rather than requested from the stream at each invocation of read or readLine.
You will likely only experience a timeout like you describe if you are reading large amounts of data. I had some trouble finding a credible reference but I have seen several mentions of the default buffer size on BufferedReader being 8192 bytes (8kb). This means that if your stream is reading more than 8kb of data, the buffer could potentially fill and cause your process to wait on the DB bottleneck before reading more data from the stream.
If you think you need to reserve a larger buffer than this, the BufferedReader constructor is overloaded with a second parameter allowing you to specify the size of the buffer in bytes. Keep in mind, though, that unless you are moving small enough pieces of data to buffer the entire stream, you could run into the same problem even with a larger buffer.
br = new BufferedReader(new InputStreamReader(website.openStream(), "UTF-16"), size);
will initialize your BufferedReader with a buffer of size bytes.
EDIT:
After reading #Keith's comment, I think he's got the right of it here. If you experience timeouts the smaller buffer will cause you to read from the socket more frequently, hopefully eliminating that issue. If he posts an answer with that you should accept his.

BufferedReader just reads in chunks into an internal buffer, whose default size is unspecified but has been 4096 chars for many years. It doesn't do anything while you're not calling it.
But I don't think your perceived problem even exists. I don't see how the web service will even know. Write timeouts in TCP are quite difficult to implement. Some platforms have APIs for that, but they aren't supported by Java.
Most likely the web service is just using a blocking mode socket and it will just block in its write if you aren't reading fast enough.

Is it overkill to use BufferedWriter and BufferedOutputStream together?

I want to write to a socket. From reading about network IO, it seems to me that the optimal way to write to it is to do something like this:
OutputStream outs=null;
BufferedWriter out=null;
out =
new BufferedWriter(
new OutputStreamWriter(new BufferedOutputStream(outs),"UTF-8"));
The BufferedWriter would buffer the input to the OutputStreamWriter which is recommended, because it prevents the writer from starting up the encoder for each character.
The BufferedOutputStream would then buffer the bytes from the Writer to avoid putting one byte at a time potentially onto the network.
It looks a bit like overkill, but it all seems like it helps?
Grateful for any help..
EDIT: From the javadoc on OutputStreamWriter:
Each invocation of a write() method causes the encoding converter to be invoked on the given character(s). The resulting bytes are accumulated in a buffer before being written to the underlying output stream. The size of this buffer may be specified, but by default it is large enough for most purposes. Note that the characters passed to the write() methods are not buffered.
For top efficiency, consider wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent converter invocations. For example:
Writer out = new BufferedWriter(new OutputStreamWriter(System.out));

The purpose of the Buffered* classes is to coalesce small write operations into a larger one, thereby reducing the number of system calls, and increasing throughput.
Since a BufferedWriter already collects writes in a buffer, then converts the characters in the buffer into another buffer, and writes that buffer to the underlying OutputStream in a single operation, the OutputStream is already invoked with large write operations. Therefore, a BufferedOutputStream finds nothing to combine, and is simply redundant.
As an aside, the same can apply to the BufferedWriter: buffering will only help if the writer is only passed few characters at a time. If you know the caller only writes huge strings, the BufferedWriter will find nothing to combine and is redundant, too.

The BufferedWriter would buffer the input to the outputStreamWriter, which is recommended because it prevents the writer from starting up the encoder for each character.
Recommended by who, and in what context? What do you mean by "starting up the encoder"? Are you writing a single character at a time to the writer anyway? (We don't know much about how you're using the writer... that could be important.)
The BufferedOutputStream would then buffer the bytes from the Writer to avoid putting one byte at a time potentially onto the network.
What makes you think it would write one byte at a time? I think it very unlikely that OutputStreamWriter will write a byte at a time to the underlying writer, unless you really write a character at a time to it.
Additionally, I'd expect the network output stream to use something like Nagle's algorithm to avoid sending single-byte packets.
As ever with optimization, you should do it based on evidence... have you done any testing with and without these layers of buffering?
EDIT: Just to clarify, I'm not saying the buffering classes are useless. In some cases they're absolutely the right way to go. I'm just saying that as with all optimization, they shouldn't be used blindly. You should consider what you're trying to optimize for (processor usage, memory usage, network usage etc) and measure. There are many factors which matter here - not least of which is the write pattern. If you're already writing "chunkily" - writing large blocks of character data - then the buffers will have relatively little impact. If you're actually writing a single character at a time to the writer, then they would be more significant.

Yes it is overkill. From the Javadoc for OutputStreamWriter: "Each invocation of a write() method causes the encoding converter to be invoked on the given character(s). The resulting bytes are accumulated in a buffer before being written to the underlying output stream.".

How to buffer OutputStream without any buffer limits in Java?

This case is a bit complex, I hope I will simplify it well.
My task starts when I receive PrintStream where I am supposed to output some data. However entire task is calculating + printing, and I can print when I am done with calculation. So this could be 2-pass task, but I hope for 1-pass.
In order to achieve this, I would like to create some output buffer, do calculation and printing (to buffer) and then print out from the buffer to the real output stream.
So far so good, the problem is I am unable to find appropriate class for buffering -- BufferedOutputStream if I understand correctly, starts writing from the buffer when the buffer is full. I have to have much more strict control over it -- not writing to real output until I exactly say so.
Question -- is there any class appropriate for this task?

You could use ByteArrayOutputStream as your buffer. The byte array where this stream writes is enlarged automatically to hold everything you write.
When you are done generating output, just call the writeTo method to write the contents of the buffer to an output stream that writes to some actual device.
For further details see http://docs.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html

From the javadoc of the flush method :
Flushes this buffered output stream. This forces any buffered output
bytes to be written out to the underlying output stream.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.