readFully(byte[] b, int off, int len) and EOFException - java

I'm encountering this exception a lot in a piece of code I have, and it's happening when I call readFully. I don't understand how it can happen though, because readFully is supposed to block until len bytes are available. If it knows that that many bytes are available, how can it then later meet an EOF?
And how can I get around this issue? (I'm reading the first 3 bytes to get the length (TL part of TLV) and then encountering the issue sporadically when reading the V).
Thanks

The documentation says:
throws: EOFException - if this input stream reaches the end before reading all the bytes.
So this behavior is expected if the length you send is not correct (i.e. is larger than the actual length), or if the sender closes the stream before having written all the bytes.

Related

Behind the scenes of Java's BufferedInputStream

To start with, I understand the concept of buffering as a wrapper around, for instance, FileInuptStream to act as a temporary container for contents read(lets take read scenario) from an underlying stream, in this case - FileInputStream.
Say, there are 100 bytes to read from a stream(file as a source).
Without buffering, code(read method of BufferedInputStream) has to make 100 reads(one byte at a time).
With buffering, depending on buffer size, code makes <= 100 reads.
Lets assume buffer size to be 50.
So, the code reads the buffer(as a source) only twice to read the contents of a file.
Now, as the FileInuptStream is the ultimate source(though wrapped by BufferedInputStream) of data(file which contains 100 bytes), wouldn't it has to read 100 times to read 100 bytes? Though, the code calls read method of BufferedInputStream but, the call is passed to read method of FileInuptStream which needs to make 100 read calls. This is the point which I'm unable to comprehend.
IOW, though wrapped by a BufferedInputStream, the underlying streams(such as FileInputStream) still have to read one byte at a time. So, where is the benefit(not for the code which requires only two read calls to buffer but, to the application's performance) of buffering?
Thanks.
EDIT:
I'm making this as a follow-up 'edit' rather than 'comment' as I think its contextually better suits here and as a TL;DR for readers of chat between #Kayaman and me.
The read method of BufferedInputStream says(excerpt):
As an additional convenience, it
attempts to read as many bytes as possible by repeatedly invoking the
read method of the underlying stream. This iterated read continues
until one of the following conditions becomes true:
The specified number of bytes have been read,
The read method of the underlying stream returns -1, indicating end-of-file, or
The available method of the underlying stream returns zero, indicating that further input requests would block.
I digged into the code and found method call trace as under:
BufferedInputStream -> read(byte b[]) As a I want to see buffering in action.
BufferedInputStream -> read(byte b[], int off, int len)
BufferedInputStream -> read1(byte[] b, int off, int len) - private
FileInputStream -
read(byte b[], int off, int len)
FileInputStream -> readBytes(byte b[], int off, int len) - private and native. Method description from source code -
Reads a subarray as a sequence of bytes.
Call to read1(#4, above mentioned) in BufferedInputStream is in an infinite for loop. It returns on conditions mentioned in above excerpt of read method description.
As I had mentioned in OP(#6), the call does seem to handle by an underlying stream which matches API method description and method call trace.
The question still remains, if native API call - readBytes of FileInputStream reads one byte at a time and create an array of those bytes to return?
The underlying streams(such as FileInputStream) still have to read
one byte at a time
Luckily no, that would be hugely inefficient. It allows the BufferedInputStream to make read(byte[8192] buffer) calls to the FileInputStream which will return a chunk of data.
If you then want to read a single byte (or not), it will efficiently be returned from BufferedInputStream's internal buffer instead of having to go down to the file level. So the BI is there to reduce the times we do actual reads from the filesystem, and when those are done, they're done in an efficient fashion even if the end user wanted to read just a few bytes.
It's quite clear from the code that BufferedInputStream.read() does not delegate directly to UnderlyingStream.read(), as that would bypass all the buffering.
public synchronized int read() throws IOException {
if (pos >= count) {
fill();
if (pos >= count)
return -1;
}
return getBufIfOpen()[pos++] & 0xff;
}

Reading first four bytes from ByteBuffer, then writing them back?

I have a ByteBuffer object called msg with the intended message length in the first four bytes, which I read as follows:
int msgLen = msg.getInt();
LOG.debug("Message size: " + msgLen);
If the msgLen is less than some threshold value, I have a partial message and need to cache. In this case, I'd like to put those first four bytes back into the beginning of the message; that is, put the message back together to be identical to pre-reading. For example:
if (msgLen < threshold) {
msg.rewind();
msg.put(msgLen);
Unfortunately, this does not seem to be the correct way to do this. I've tried many combinations of flip, put, and rewind, but must be misunderstanding.
How would I put the bytes back into the write buffer in their original order?
Answer was posted by Andremoniy in comments section. Read operations do not consume bytes in the buffer, so msg.rewind() was adequate. This didn't work in my case because of some other logic in the program, and I incorrectly associated that with a problem at the buffer level.

Debugging if UTF-8 decoding is done correctly?

We have a Java code talking to external system over TCP connections with xml messages encoded in UTF-8.
The message received begin with '?'. SO the XML received is
?<begin>message</begin>
There is a real doubt if the first character is indeed '?'. At the moment, we cannot ask the external system if/what.
The code snippet for reading the stream is as below.
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8")));
int readByte = reader.read();
if (readByte <= 0) {
inputStream.close();
}
builder.append((char) readByte);
We are currently trying to log the raw bytes int readByte = inputStream.read(). The logs will take few days to be received.
In the mean time, I was wondering how we could ascertain at our end if it was truly a '?' and not a decoding issue?
I suspect strongly you have a byte-order-mark at the beginning of your doc. That won't render as a valid character, and consequently could appear as a question mark. Can you dump the raw bytes out and check for that sequence ?
Your question seems to boil down to this:
Can we ascertain the real value of the first byte of the message without actually looking at it.
The answer is "No, you can't". (Obviously!)
...
However, if you could intercept the TCP/IP traffic from the external system with a packet sniffer (aka traffic monitoring tool), then dumping the first byte or bytes of the message would be simple ... requiring no code changes.
Is logging the int returned by inputStream.read() the correct way to to analyse the bytes received. Or does the word length of the OS or other environment variables come into picture.
The InputStream.read() method returns either a single (unsigned) byte of data (in the range 0 to 255 inclusive) or -1 to indicate "end of stream". It is not sensitive to the "word length" or anything else.
In short, provided you treat the results appropriately, calling read() should give you the data you need to see what the bytes in the stream really are.

Read all bytes from socket Stops at 52964 bytes

I'm making a Server that gets packages at 64 kb size.
int length = 65536;
byte[] bytes = new byte[length];
int pos = 0;
while(pos < length -1)
{
System.out.println("Before read");
pos += dis.read(bytes, pos, length-pos);
System.out.println(""+pos+" >> "+ length);
}
This is the code I use to read all bytes from the socket. Dis is a InputStream.
When I run the code 1 out of n goes wrong. The code only receives 52964 bytes instead of 65536 bytes.
I also checked the C code and it says it send 65536 bytes.
Does someone know what I'm doing wrong?
This is yet another case where Jakarta Commons IOUtils is a better choice than writing it yourself. It's one line of code, and it's fully tested. I recommend IOUtils.readFully() in this case.
If it does not read the entire buffer, then you know that you're not sending all the content. Perhaps you're missing a flush on the server side.
InputStream.read() returns the number of bytes read or -1 if the end of the stream has been reached. You need to check for that error condition. Also, I suspect your while(..) loop is the problem. Why are you calling it pos as in position? You may be terminating prematurely. Also, ensure that your C code, whatever it is doing, is sending properly. You can examine the network traffic with a tool like Wireshark to be sure.
What do you mean it "goes wrong"? What is the output? It can't be exiting the loop before reading the full 64 KB, so what really happens?
Also, it's better to save the return value of the I/O call separately and inspect it, before assuming the I/O was successful. If that's DataInputStream.read(), it returns -1 on error.
Your code is incorrect as it doesn't check for -1.
This is a case for using DataInputStream.readFully() rather than coding it yourself and getting it wrong.

Can calling available() for a BufferedInputStream lead me astray in this case?

I am reading in arbitrary size file in blocks of 1021 bytes, with a block size of <= 1021 bytes for the final block of the file. At the moment, I am doing this using a BufferedInputStream which is wrapped around a FileInputStream and code that looks (roughly) like the following (where reader is the BufferedInputStream and this is operating in a loop):
int availableData = reader.available();
int datalen = (availableData >= 1021)
? 1021
: availableData;
reader.read(bufferArray, 0, datalen);
However, from reading the API docs, I note that available() only gives an "estimate" of the available size, before the call would 'block'. Printing out the value of availableData each iteration seems to give the expected values - starting with the file size and slowly getting less until it is <= 1021. Given that this is a local file, am I wrong to expect this to be a correct value - is there a situation where available() would give an incorrect answer?
EDIT: Sorry, additional information. The BufferedInputStream is wrapped around a FileInputStream. From the source code for a FIS, I think I'm safe to rely on available() as a measure of how much data is left in the case of a local file. Am I right?
The question is pointless. Those four lines of code are entirely equivalent to this:
reader.read(buffer, 0, 1021);
without the timing-window problem you have introduced between the available() call and the read. Note that this code is still incorrect as you are ignoring the return value, which can be -1 at EOS, or else anything between 1 and 1021 inclusive.
It doesn't give the estimated size, it gives the remaining bytes that can be read. It's not an estimate with BufferedInputStream.
Returns the number of bytes that can
be read from this input stream without
blocking.
You should pass available() directly into the read() call if you want to avoid blocking, but remember to return if the return value is 0 or -1. available() might throw an exception on buffer types that don't support the operation.

Categories

Resources