Behind the scenes of Java's BufferedInputStream - java

To start with, I understand the concept of buffering as a wrapper around, for instance, FileInuptStream to act as a temporary container for contents read(lets take read scenario) from an underlying stream, in this case - FileInputStream.
Say, there are 100 bytes to read from a stream(file as a source).
Without buffering, code(read method of BufferedInputStream) has to make 100 reads(one byte at a time).
With buffering, depending on buffer size, code makes <= 100 reads.
Lets assume buffer size to be 50.
So, the code reads the buffer(as a source) only twice to read the contents of a file.
Now, as the FileInuptStream is the ultimate source(though wrapped by BufferedInputStream) of data(file which contains 100 bytes), wouldn't it has to read 100 times to read 100 bytes? Though, the code calls read method of BufferedInputStream but, the call is passed to read method of FileInuptStream which needs to make 100 read calls. This is the point which I'm unable to comprehend.
IOW, though wrapped by a BufferedInputStream, the underlying streams(such as FileInputStream) still have to read one byte at a time. So, where is the benefit(not for the code which requires only two read calls to buffer but, to the application's performance) of buffering?
Thanks.
EDIT:
I'm making this as a follow-up 'edit' rather than 'comment' as I think its contextually better suits here and as a TL;DR for readers of chat between #Kayaman and me.
The read method of BufferedInputStream says(excerpt):
As an additional convenience, it
attempts to read as many bytes as possible by repeatedly invoking the
read method of the underlying stream. This iterated read continues
until one of the following conditions becomes true:
The specified number of bytes have been read,
The read method of the underlying stream returns -1, indicating end-of-file, or
The available method of the underlying stream returns zero, indicating that further input requests would block.
I digged into the code and found method call trace as under:
BufferedInputStream -> read(byte b[]) As a I want to see buffering in action.
BufferedInputStream -> read(byte b[], int off, int len)
BufferedInputStream -> read1(byte[] b, int off, int len) - private
FileInputStream -
read(byte b[], int off, int len)
FileInputStream -> readBytes(byte b[], int off, int len) - private and native. Method description from source code -
Reads a subarray as a sequence of bytes.
Call to read1(#4, above mentioned) in BufferedInputStream is in an infinite for loop. It returns on conditions mentioned in above excerpt of read method description.
As I had mentioned in OP(#6), the call does seem to handle by an underlying stream which matches API method description and method call trace.
The question still remains, if native API call - readBytes of FileInputStream reads one byte at a time and create an array of those bytes to return?

The underlying streams(such as FileInputStream) still have to read
one byte at a time
Luckily no, that would be hugely inefficient. It allows the BufferedInputStream to make read(byte[8192] buffer) calls to the FileInputStream which will return a chunk of data.
If you then want to read a single byte (or not), it will efficiently be returned from BufferedInputStream's internal buffer instead of having to go down to the file level. So the BI is there to reduce the times we do actual reads from the filesystem, and when those are done, they're done in an efficient fashion even if the end user wanted to read just a few bytes.
It's quite clear from the code that BufferedInputStream.read() does not delegate directly to UnderlyingStream.read(), as that would bypass all the buffering.
public synchronized int read() throws IOException {
if (pos >= count) {
fill();
if (pos >= count)
return -1;
}
return getBufIfOpen()[pos++] & 0xff;
}

Related

Reading n bytes atomically without blocking

I just asked a question about why my thread shut down wasn't working. It ended up being due to readLine() blocking my thread before the shutdown flag could be recognised. This was easy to fix by checking ready() before calling readLine().
However, I'm now using a DataInputStream to do the following in series:
int x = reader.readInt();
int y = reader.readInt();
byte[] z = new byte[y]
reader.readFully(z);
I know I could implement my own buffering which would check the running file flag while loading up the buffer. But I know this would be tedious. Instead, I could let the data be buffered within the InputStream class, and wait until I have my n bytes read, before executing a non-blocking read - as I know how much I need to read.
4 bytes for the first integer
4 bytes for the second integer y
and y bytes for the z byte array.
Instead of using ready() to check if there is a line in the buffer, is there some equivalent ready(int bytesNeeded)?
The available() method returns the amount of bytes in the InputStreams internal buffer.
So, one can do something like:
while (reader.available() < 4) checkIfShutdown();
reader.readInt();
You can use InputStream.available() to get an estimate of the amount of bytes that can be read. Quoting the Javadoc:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking, which may be 0, or 0 when end of stream is detected. The read might be on the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In other words, if available() returns n, you know you can safely call read(n) without blocking. Note that, as the Javadoc states, the value returned is an estimate. For example, InflaterInputStream.available() will always return 1 if EOF isn't reached. Check the documentation of the InputStream subclass you will be using to ensure it meets your needs.
You are going to need to implement your own equivalent of BufferedInputStream. Either as a sole owner of an InputStream and a thread (possibly borrowed from a pool) to block in. Alternatively, implement with NIO.

Conceptual confusion about Java IO read() from InputStream and write() function to OutputStream

I'm currently picking up on Java IO functions and coding but am a little confused while reading through online tutorials. This is in reference to the question posted here: InputStream/OutputStream read()/write() function relevance and usage
This seemed to hint that the difference between a basic write() function and the
write(byte[] bytes, int offset, int length) function is in its time efficiency, but i didn't quite get the meaning of that.
In the tutorial, it was stated:
public int read(byte[] bytes, int offset, int length) throws IOException
// Read "length" number of bytes, store in bytes array starting from offset
of index.
public int read(byte[] bytes) throws IOException
// Same as read(bytes, 0, bytes.length)
What exactly do these two lines of code do to illustrate what read() does in java IO? So does the first line read the length of the file's info OR the file's actual info itself.
To pile on more confusion, the Write() function to OutputStream was explained as follows:
"Similar to the input counterpart, the abstract superclass OutputStream declares an abstract method write() to write a data-byte to the output sink. write() takes an int. The least-significant byte of the int argument is written out; the upper 3 bytes are discarded. It throws an IOException if I/O error occurs (e.g., output stream has been closed)."
Does this mean the actual info is written in or the argument? kinda confused what the paragraph was trying to say.
public void abstract void write(int unsignedByte) throws IOException\
public void write(byte[] bytes, int offset, int length) throws IOException
// Write "length" number of bytes, from the bytes array starting from offset
of index.
public void write(byte[] bytes) throws IOException
// Same as write(bytes, 0, bytes.length)
Thanks in advance for any explanation on this.
This seemed to hint that the difference between a basic write() function and the write(byte[] bytes, int offset, int length) function is in its time efficiency, but i didn't quite get the meaning of that.
The "basic" write(b) method writes just a single byte of data.
If you wanted to use it to write large amounts of data, you would be making a lot of method calls.
That is why for efficiency's sake, there are also versions of write that can take many bytes in a single call. This works by passing in those bytes in an array (the "buffer").
In addition to reducing number of calls, efficient use (and re-use) of this buffer also saves a lot of work. But this makes the code a bit more complicated to use: Instead of just receiving results for method calls, you have to manage these byte arrays.
The same applies for the read methods.
public int read(byte[] bytes, int offset, int length) throws IOException
// Read "length" number of bytes, store in bytes array starting from offset of index.
So does the first line read the length of the file's info OR the file's actual info itself ?
The method reads up to length bytes from the file (it may read less, for example when the file is too short)
The data that it reads is not returned from the method, it is placed into the array bytes. This is too avoid having to copy it around too much (if it wanted to return the data, it would need to make a new byte[]).
What the method returns is the number of bytes that have been read (you need to know that, because it may be fewer than what you asked for, see above).
There is also a parameter offset that tells the method to not start writing data into the buffer (bytes) at the beginning, but somewhere else. This allows you to reuse the buffer for other things (again to avoid having to copy the data around too much).

InputDataStream.available() is always 0 Java Socket Client

Why does this part of my client code is always zero ?
InputStream inputStream = clientSocket.getInputStream();
int readCount = inputStream.available(); // >> IS ALWAYS ZERO
byte[] recvBytes = new byte[readCount];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int n = inputStream.read(recvBytes);
...
Presumably it's because no data has been received yet. available() tries to return the amount of data available right now without blocking, so if you call available() straight after making the connection, I'd expect to receive 0 most of the time. If you wait a while, you may well find available() returns a different value.
However, personally I don't typically use available() anyway. I create a buffer of some appropriate size for the situation, and just read into that:
byte[] data = new byte[16 * 1024];
int bytesRead = stream.read(data);
That will block until some data is available, but it may well return read than 16K of data. If you want to keep reading until you reach the end of the stream, you need to loop round.
Basically it depends on what you're trying to do, but available() is rarely useful in my experience.
From the java docs
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
A subclass' implementation of this method may choose to throw an IOException if this input stream has been closed by invoking the close() method.
The available method for class InputStream always returns 0.
This method should be overridden by subclasses.
Here is a note to understand why it returns 0
In InputStreams, read() calls are said to be "blocking" method calls. That means that if no data is available at the time of the method call, the method will wait for data to be made available.
The available() method tells you how many bytes can be read until the read() call will block the execution flow of your program. On most of the input streams, all call to read() are blocking, that's why available returns 0 by default.
However, on some streams (such as BufferedInputStream, that have an internal buffer), some bytes are read and kept in memory, so you can read them without blocking the program flow. In this case, the available() method tells you how many bytes are kept in the buffer.
According to the documentation, available() only returns the number of bytes that can be read from the stream without blocking. It doesn't mean that a read operation won't return anything.
You should check this value after a delay, to see it increasing.
There are very few correct uses of available(), and this isn't one of them. In this case no data had arrived so it returned zero, which is what it's supposed to do.
Just read until you have what you need. It will block until data is available.

Can calling available() for a BufferedInputStream lead me astray in this case?

I am reading in arbitrary size file in blocks of 1021 bytes, with a block size of <= 1021 bytes for the final block of the file. At the moment, I am doing this using a BufferedInputStream which is wrapped around a FileInputStream and code that looks (roughly) like the following (where reader is the BufferedInputStream and this is operating in a loop):
int availableData = reader.available();
int datalen = (availableData >= 1021)
? 1021
: availableData;
reader.read(bufferArray, 0, datalen);
However, from reading the API docs, I note that available() only gives an "estimate" of the available size, before the call would 'block'. Printing out the value of availableData each iteration seems to give the expected values - starting with the file size and slowly getting less until it is <= 1021. Given that this is a local file, am I wrong to expect this to be a correct value - is there a situation where available() would give an incorrect answer?
EDIT: Sorry, additional information. The BufferedInputStream is wrapped around a FileInputStream. From the source code for a FIS, I think I'm safe to rely on available() as a measure of how much data is left in the case of a local file. Am I right?
The question is pointless. Those four lines of code are entirely equivalent to this:
reader.read(buffer, 0, 1021);
without the timing-window problem you have introduced between the available() call and the read. Note that this code is still incorrect as you are ignoring the return value, which can be -1 at EOS, or else anything between 1 and 1021 inclusive.
It doesn't give the estimated size, it gives the remaining bytes that can be read. It's not an estimate with BufferedInputStream.
Returns the number of bytes that can
be read from this input stream without
blocking.
You should pass available() directly into the read() call if you want to avoid blocking, but remember to return if the return value is 0 or -1. available() might throw an exception on buffer types that don't support the operation.

What 0 returned by InputStream.read(byte[]) means? How to handle this?

What 0 (number of bytes read) returned by InputStream.read(byte[]) and InputStream.read(byte[], int, int) means? How to handle this situation?
To be clear, I mean read(byte[] b) or read(byte[] b, int off, int len) methods which return number of bytes read.
The only situation in which a InputStream may return 0 from a call to read(byte[]) is when the byte[] passed in has a length of 0:
byte[] buf = new byte[0];
int read = in.read(buf); // read will contain 0
As specified by this part of the JavaDoc:
If the length of b is zero, then no bytes are read and 0 is returned
My guess: you used available() to see how big the buffer should be and it returned 0. Note that this is a misuse of available(). The JavaDoc explicitly states that:
It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
Take a look at the implementation of javax.sound.AudioInputStream#read(byte[] b, int off, int len) ... yuck. They completely violated the standard java.io.InputStream semantics and return a read size of 0 if you request fewer than a whole frame of data.
So unfortunately; the common advice (and api spec) should preclude having to deal with return of zero when len > 0 but even for JDK provided classes you can't universally rely on this to be true for InputStreams of arbitrary types.
Again, yuck.
According to Java API Doc:
http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html#read(byte[])
It only can happen if the byte[] you passed has zero items (new byte[0]).
In other situations it must return at least one byte. Or -1 if EOF reached. Or an exception.
Of course: it depends of the actual implementation of the InputStream you are using!!! (it could be a wrong one)
I observed the same behavior (reading 0 bytes) when I build a swing console output window and made a reader-thread for stdout and stderr via the following code:
this.pi = new PipedInputStream();
po = new PipedOutputStream((PipedInputStream)pi);
System.setOut(new PrintStream(po, true));
When the 'main' swing application exits, and my console window is still open I read 0 from this.pi.read().
The read data was put on the console window resulting in a race condition some how, just ignoring the result and not updating the console window solved the issue.

Categories

Resources