Exactly what read/block guarantees does DataInputStream provide following available() - java

I've read the java docs and a number of related questions but am unsure if the following is guaranteed to work:
I have a DataInputStream on a dedicated thread that continually reads small amounts of data, of known byte-size, from a very active connection. I'd like to alert the user when the stream becomes inactive (i.e. network goes down) so I've implemented the following:
...
streamState = waitOnStreamForState(stream, 4);
int i = stream.readInt();
...
private static int
waitOnStreamForState(DataInputStream stream, int nBytes) throws IOException {
return waitOnStream(stream, nBytes, STREAM_ACTIVITY_THRESHOLD, STREAM_POLL_INTERVAL)
? STREAM_STATE_ACTIVE
: STREAM_STATE_INACTIVE;
private static boolean
waitOnStream(DataInputStream stream, int nBytes, long timeout, long pollInterval) throws IOException {
int timeWaitingForAvailable = 0;
while( stream.available() < nBytes ){
if( timeWaitingForAvailable >= timeout && timeout > 0 ){
return false;
}
try{
Thread.sleep(pollInterval);
}catch( InterruptedException e ){
Thread.currentThread().interrupt();
return (stream.available() >= nBytes);
}
timeWaitingForAvailable += pollInterval;
}
return true;
}
The docs for available() explain:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Does this mean it's possible the next read (inside readInt()) might only, for instance, read 2 bytes, and the subsequent read to finish retrieving the Integer could block? I realize readInt() is a method of the stream 'called next' but I presume it has to loop on a read call until it gets 4 bytes and the docs don't mention subsequent calls. In the above example is it possible that the readInt() call could still block even if waitOnStreamForState(stream, 4) returns STREAM_STATE_ACTIVE?
(and yes, I realize my timeout mechanism is not exact)

Does this mean it's possible the next read (inside readInt()) might only, for instance, read 2 bytes, and the subsequent read to finish retrieving the Integer could block?
That's what it says. However at least the next read() won't block.
I realize readInt() is a method of the stream 'called next' but I presume it has to loop on a read call until it gets 4 bytes and the docs don't mention subsequent calls. In the above example is it possible that the readInt() call could still block even if waitOnStreamForState(stream, 4) returns STREAM_STATE_ACTIVE?
That's what it says.
For example, consider SSL. You can tell that there is data available, but you can't tell how much without actually decrpyting it, so a JSSE implementation is free to:
always return 0 from available() (this is what it used to do)
always return 1 if the underlying socket's input stream has available() > 0, otherwise zero
return the underlying socket input stream's available() value and rely on this wording to get it out of trouble if the actual plaintext data is less. (However the correct value might still be zero, if the cipher data consisted entirely of handshake messages or alerts.)
However you don't need any of this. All you need is a read timeout, set via Socket.setSoTimeout(), and a catch for SocketTimeoutException. There are few if any correct uses of available(): fewer and fewer as time goes on, it seems to me. You should certainly not waste time calling sleep().

Related

Reading n bytes atomically without blocking

I just asked a question about why my thread shut down wasn't working. It ended up being due to readLine() blocking my thread before the shutdown flag could be recognised. This was easy to fix by checking ready() before calling readLine().
However, I'm now using a DataInputStream to do the following in series:
int x = reader.readInt();
int y = reader.readInt();
byte[] z = new byte[y]
reader.readFully(z);
I know I could implement my own buffering which would check the running file flag while loading up the buffer. But I know this would be tedious. Instead, I could let the data be buffered within the InputStream class, and wait until I have my n bytes read, before executing a non-blocking read - as I know how much I need to read.
4 bytes for the first integer
4 bytes for the second integer y
and y bytes for the z byte array.
Instead of using ready() to check if there is a line in the buffer, is there some equivalent ready(int bytesNeeded)?
The available() method returns the amount of bytes in the InputStreams internal buffer.
So, one can do something like:
while (reader.available() < 4) checkIfShutdown();
reader.readInt();
You can use InputStream.available() to get an estimate of the amount of bytes that can be read. Quoting the Javadoc:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking, which may be 0, or 0 when end of stream is detected. The read might be on the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In other words, if available() returns n, you know you can safely call read(n) without blocking. Note that, as the Javadoc states, the value returned is an estimate. For example, InflaterInputStream.available() will always return 1 if EOF isn't reached. Check the documentation of the InputStream subclass you will be using to ensure it meets your needs.
You are going to need to implement your own equivalent of BufferedInputStream. Either as a sole owner of an InputStream and a thread (possibly borrowed from a pool) to block in. Alternatively, implement with NIO.

Behind the scenes of Java's BufferedInputStream

To start with, I understand the concept of buffering as a wrapper around, for instance, FileInuptStream to act as a temporary container for contents read(lets take read scenario) from an underlying stream, in this case - FileInputStream.
Say, there are 100 bytes to read from a stream(file as a source).
Without buffering, code(read method of BufferedInputStream) has to make 100 reads(one byte at a time).
With buffering, depending on buffer size, code makes <= 100 reads.
Lets assume buffer size to be 50.
So, the code reads the buffer(as a source) only twice to read the contents of a file.
Now, as the FileInuptStream is the ultimate source(though wrapped by BufferedInputStream) of data(file which contains 100 bytes), wouldn't it has to read 100 times to read 100 bytes? Though, the code calls read method of BufferedInputStream but, the call is passed to read method of FileInuptStream which needs to make 100 read calls. This is the point which I'm unable to comprehend.
IOW, though wrapped by a BufferedInputStream, the underlying streams(such as FileInputStream) still have to read one byte at a time. So, where is the benefit(not for the code which requires only two read calls to buffer but, to the application's performance) of buffering?
Thanks.
EDIT:
I'm making this as a follow-up 'edit' rather than 'comment' as I think its contextually better suits here and as a TL;DR for readers of chat between #Kayaman and me.
The read method of BufferedInputStream says(excerpt):
As an additional convenience, it
attempts to read as many bytes as possible by repeatedly invoking the
read method of the underlying stream. This iterated read continues
until one of the following conditions becomes true:
The specified number of bytes have been read,
The read method of the underlying stream returns -1, indicating end-of-file, or
The available method of the underlying stream returns zero, indicating that further input requests would block.
I digged into the code and found method call trace as under:
BufferedInputStream -> read(byte b[]) As a I want to see buffering in action.
BufferedInputStream -> read(byte b[], int off, int len)
BufferedInputStream -> read1(byte[] b, int off, int len) - private
FileInputStream -
read(byte b[], int off, int len)
FileInputStream -> readBytes(byte b[], int off, int len) - private and native. Method description from source code -
Reads a subarray as a sequence of bytes.
Call to read1(#4, above mentioned) in BufferedInputStream is in an infinite for loop. It returns on conditions mentioned in above excerpt of read method description.
As I had mentioned in OP(#6), the call does seem to handle by an underlying stream which matches API method description and method call trace.
The question still remains, if native API call - readBytes of FileInputStream reads one byte at a time and create an array of those bytes to return?
The underlying streams(such as FileInputStream) still have to read
one byte at a time
Luckily no, that would be hugely inefficient. It allows the BufferedInputStream to make read(byte[8192] buffer) calls to the FileInputStream which will return a chunk of data.
If you then want to read a single byte (or not), it will efficiently be returned from BufferedInputStream's internal buffer instead of having to go down to the file level. So the BI is there to reduce the times we do actual reads from the filesystem, and when those are done, they're done in an efficient fashion even if the end user wanted to read just a few bytes.
It's quite clear from the code that BufferedInputStream.read() does not delegate directly to UnderlyingStream.read(), as that would bypass all the buffering.
public synchronized int read() throws IOException {
if (pos >= count) {
fill();
if (pos >= count)
return -1;
}
return getBufIfOpen()[pos++] & 0xff;
}

Detect DataInputStream end of stream

I'm working on the server for a game right now. The server's packet reading loop is blocking, and typically waits until a packet is received to continue through the loop. However, if the client disconnects, the DataInputStream returns a single byte (-1) and the loop is executed in rapid succession, as is expected. However, I don't use the DataInputStream's read() method to read one byte at a time, I use the read(byte[]) method to read them all at once into a byte array. As such, I can't easily detect if the stream is returning a single byte valued at -1.
Possible Solution: I could check if the first byte of the array is -1, and if so loop through the array to see if the rest of the array is nothing but zeroes. Doing this seems extremely inefficient however, and I feel that it would affect performance as client count increases.
Here's a simplified version of my packet-reading loop:
while (!thread.isInterrupted() && !isDisconnected())
{
try
{
byte[] data = new byte[26];
data = new byte[26];
input.read(data);
//Need to check if end of stream here somehow
Packet rawPacket = Packet.extractPacketFromData(data); //Constructs packet from the received data
if(rawPacket instanceof SomePacket)
{
//Do stuff with packet
}
}
catch(IOException e)
{
disconnectClient(); //Toggles flag showing client has disconnected
}
}
Your understanding of read(byte[]) is incorrect. It doesn't set a value in your array to -1.
The Javadoc says:
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
You need to check the return value:
int bytesRead = input.read(data);
if (bytesRead == -1)
{
// it's end of stream
}
As a side note, even when just reading data normally, you do need to check the number of bytes read was the number of bytes you requested. The call to read is not guaranteed to actually fill your array.
You should take a look at readFully() which does read fully, and throws an EOFException for end of stream:
Reads some bytes from an input stream and stores them into the buffer array b. The number of bytes read is equal to the length of b.
This method blocks until one of the following conditions occurs:
b.length bytes of input data are available, in which case a normal return is made.
End of file is detected, in which case an EOFException is thrown.
An I/O error occurs, in which case an IOException other than EOFException is thrown.

InputDataStream.available() is always 0 Java Socket Client

Why does this part of my client code is always zero ?
InputStream inputStream = clientSocket.getInputStream();
int readCount = inputStream.available(); // >> IS ALWAYS ZERO
byte[] recvBytes = new byte[readCount];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int n = inputStream.read(recvBytes);
...
Presumably it's because no data has been received yet. available() tries to return the amount of data available right now without blocking, so if you call available() straight after making the connection, I'd expect to receive 0 most of the time. If you wait a while, you may well find available() returns a different value.
However, personally I don't typically use available() anyway. I create a buffer of some appropriate size for the situation, and just read into that:
byte[] data = new byte[16 * 1024];
int bytesRead = stream.read(data);
That will block until some data is available, but it may well return read than 16K of data. If you want to keep reading until you reach the end of the stream, you need to loop round.
Basically it depends on what you're trying to do, but available() is rarely useful in my experience.
From the java docs
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
A subclass' implementation of this method may choose to throw an IOException if this input stream has been closed by invoking the close() method.
The available method for class InputStream always returns 0.
This method should be overridden by subclasses.
Here is a note to understand why it returns 0
In InputStreams, read() calls are said to be "blocking" method calls. That means that if no data is available at the time of the method call, the method will wait for data to be made available.
The available() method tells you how many bytes can be read until the read() call will block the execution flow of your program. On most of the input streams, all call to read() are blocking, that's why available returns 0 by default.
However, on some streams (such as BufferedInputStream, that have an internal buffer), some bytes are read and kept in memory, so you can read them without blocking the program flow. In this case, the available() method tells you how many bytes are kept in the buffer.
According to the documentation, available() only returns the number of bytes that can be read from the stream without blocking. It doesn't mean that a read operation won't return anything.
You should check this value after a delay, to see it increasing.
There are very few correct uses of available(), and this isn't one of them. In this case no data had arrived so it returned zero, which is what it's supposed to do.
Just read until you have what you need. It will block until data is available.

java how does readObject in objectinputstream knows how many bytes to read?

In socket I/O, may I know how does a objectinputstream readObject knows how many bytes to read? Is the content length encapsulated inside the bytes itself or does it simply reads all the available bytes in the buffer itself?
I am asking this because I was referring to the Python socket how-to and it says
Now if you think about that a bit, you’ll come to realize a
fundamental truth of sockets: messages must either be fixed length
(yuck), or be delimited (shrug), or indicate how long they are (much
better), or end by shutting down the connection. The choice is
entirely yours, (but some ways are righter than others).
However in another SO answer, #DavidCrawshaw mentioned that `
So readObject() does not know how much data it will read, so it does
not know how many objects are available.
I am interested to know how it works...
You're over-interpreting the answer you cited. readObject() doesn't know how many bytes it will read, ahead of time, but once it starts reading it is just parsing an input stream according to a protocol, that consists of tags, primitive values, and objects, which in turn consist of tags, primitive values, and other objects. It doesn't have to know ahead of time. Consider the similar-ish case of XML. You don't know how long the document will be ahead of time, or each element, but you know when you've read it all, because the protocol tells you.
The readOject() method is using BlockedInputStream to read the byte.If you check the readObject of ObjectInputStream , it is calling
readObject0(false).
private Object readObject0(boolean unshared) throws IOException {
boolean oldMode = bin.getBlockDataMode();
if (oldMode) {
int remain = bin.currentBlockRemaining();
if (remain > 0) {
throw new OptionalDataException(remain);
} else if (defaultDataEnd) {
/*
* Fix for 4360508: stream is currently at the end of a field
* value block written via default serialization; since there
* is no terminating TC_ENDBLOCKDATA tag, simulate
* end-of-custom-data behavior explicitly.
*/
throw new OptionalDataException(true);
}
bin.setBlockDataMode(false);
}
byte tc;
while ((tc = bin.peekByte()) == TC_RESET) {
bin.readByte();
handleReset();
}
which is reading from the stream is using bin.readByte().bin is BlockiedDataInputStream which in turns use PeekInputStream to read it.This class finally is using InputStream.read().
From the description of the read method:
/**
* Reads the next byte of data from the input stream. The value byte is
* returned as an <code>int</code> in the range <code>0</code> to
* <code>255</code>. If no byte is available because the end of the stream
* has been reached, the value <code>-1</code> is returned. This method
* blocks until input data is available, the end of the stream is detected,
* or an exception is thrown.
So basically it reads byte after byte until it encounters -1.So As EJP mentioned, it never know ahead of time how many bytes are there to be read. Hope this will help you in understanfing it.

Categories

Resources