I am using DataInputStream to read some bytes from a socket. I have a expected number of bytes to read from the stream (after decoding a header, I know how many bytes are in the message) It works 99% of the time but occasionally I will have the number of bytes read be less than len.
int numRead = dis.read(buffer, 0, len);
What could cause numRead to be less than len? It's not -1. I would expect the behavior of read to block until the stream is closed or EOF is reached, but if it's a socket underlying the streams this shouldn't happen unless the socket closes, right?
Is there a way of reading bytes from a socket that will always ensure that you read len bytes?
Thanks
EDIT: For a general stream, you just keep reading until you've read everything you want to, basically. For an implementation of DataInput (such as DataInputStream) you should use readFully, as suggested by Peter Lawrey. Consider the rest of this answer to be relevant for the general case where you just have an InputStream.
It's entirely reasonable for an InputStream of any type to give you less data than you asked for, even if more is on its way. You should always code for this possibility - with the possible exception of ByteArrayInputStream. (That can still return less data than was requested, of course, if there's less data left than you asked for.)
Here's the sort of loop I'm talking about:
byte[] data = new byte[messageSize];
int totalRead = 0;
while (totalRead < messageSize) {
int bytesRead = stream.read(data, totalRead, messageSize - totalRead);
if (bytesRead < 0) {
// Change behaviour if this isn't an error condition
throw new IOException("Data stream ended prematurely");
}
totalRead += bytesRead;
}
You can use DataInputStream this way.
byte[] bytes = new byte[len];
dis.readFully(bytes);
This will either return with all the data read or throw an IOException.
read returns each time with the bits that were available at that time and -1 when done, you are typically supposed to do
while (true) {
int numRead = dis.read(buffer, 0, len);
if (numRead == -1) break;
total.append(buffer, numRead);
}
I would expect the behavior of read to
block until the stream is closed or
EOF is reached.
Then you need to check the Javadocs. The contract of read() is that it will read at least one byte, blocking if necessary until it has done so, or until EOS or an exception occurs. There is nothing in the specification that says it will read the entire length you requested. That's why it returns a length.
Is there a way of reading bytes from a socket that will always ensure that you read len bytes?
You can use Apache Commons IOUtils, they have a method that does exactly what you need:
byte[] buf = new byte[BUFFER_SIZE];
int length;
do {
length = IOUtils.read(inputStream, buf);
if (length > 0) {
//do something with buf
}
} while (length == BUFFER_SIZE);
Related
This question already has answers here:
Java multiple file transfer over socket
(3 answers)
Closed 5 years ago.
I am trying to transfer a file that is greater than 4gb using the Java SocketsAPI. I am already reading it via InputStreams and writing it via OutputStreams. However, analyzing the transmitted packets in Wireshark, I realise that the Sequence number of the TCP-packets is incremented by the byte-length of the packet, which seems to be 1440byte.
This leads to the behavior that when I try to send a file greater than 4gb, the total size of the Sequence-Number field of TCP is exceeded, leading to lots of error packages, but no error in Java.
My code for transmission currently looks like this:
DataOutputStream fileTransmissionStream = new DataOutputStream(transmissionSocket.getOutputStream());
FileInputStream fis = new FileInputStream(toBeSent);
int totalFileSize = fis.available();
fileTransmissionStream.writeInt(totalFileSize);
while (totalFileSize >0){
if(totalFileSize >= FileTransmissionManagementService.splittedTransmissionSize){
sendBytes = new byte[FileTransmissionManagementService.splittedTransmissionSize];
fis.read(sendBytes);
totalFileSize -= FileTransmissionManagementService.splittedTransmissionSize;
} else {
sendBytes = new byte[totalFileSize];
fis.read(sendBytes);
totalFileSize = 0;
}
byte[] encryptedBytes = DataEncryptor.encrypt(sendBytes);
/*byte[] bytesx = ByteBuffer.allocate(4).putInt(encryptedBytes.length).array();
fileTransmissionStream.write(bytesx,0,4);*/
fileTransmissionStream.writeInt(encryptedBytes.length);
fileTransmissionStream.write(encryptedBytes, 0, encryptedBytes.length);
What exactly have I done wrong in this situation, or is it not possible to transmit files greater than 4gb via one Socket?
TCP can handle infinitely long data streams. There is no problem with the sequence number wrapping around. As it is initially random, that can happen almost immediately, regardless of the length of the stream. The problems are in your code:
DataOutputStream fileTransmissionStream = new DataOutputStream(transmissionSocket.getOutputStream());
FileInputStream fis = new FileInputStream(toBeSent);
int totalFileSize = fis.available();
Classic misuse of available(). Have a look at the Javadoc and see what it's really for. This is also where your basic problem lies, as values > 2G don't fit into an int, so there is a truncation. You should be using File.length(), and storing it into a long.
fileTransmissionStream.writeInt(totalFileSize);
while (totalFileSize >0){
if(totalFileSize >= FileTransmissionManagementService.splittedTransmissionSize){
sendBytes = new byte[FileTransmissionManagementService.splittedTransmissionSize];
fis.read(sendBytes);
Here you are ignoring the result of read() here. It isn't guaranteed to fill the buffer: that's why it returns a value. See, again, the Javadoc.
totalFileSize -= FileTransmissionManagementService.splittedTransmissionSize;
} else {
sendBytes = new byte[totalFileSize];
Here you are assuming the file size fits into an int, and assuming the bytes fit into memory.
fis.read(sendBytes);
See above re read().
totalFileSize = 0;
}
byte[] encryptedBytes = DataEncryptor.encrypt(sendBytes);
/*byte[] bytesx = ByteBuffer.allocate(4).putInt(encryptedBytes.length).array();
fileTransmissionStream.write(bytesx,0,4);*/
We're not interested in your commented-out code.
fileTransmissionStream.writeInt(encryptedBytes.length);
fileTransmissionStream.write(encryptedBytes, 0, encryptedBytes.length);
You don't need all this crud. Use a CipherOutputStream to take care of the encryption, or better still SSL, and use the following copy loop:
byte[] buffer = new byte[8192]; // or much more if you like, but there are diminishing returns
int count;
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
It seems that your protocol for the transmission is:
Send total file length in an int.
For each bunch of bytes read,
Send number of encrypted bytes ahead in an int,
Send the entrypted bytes themselves.
The basic problem, beyond the misinterpretations of the documentation that were pointed out in #EJP's answer, is with this very protocol.
You assume that the file length can be sent oven in an int. This means the length it sends cannot be more than Integer.MAX_VALUE. Of course, this limits you to files of 2G length (remember Java integers are signed).
If you take a look at the Files.size() method, which is a method for getting the actual file size in bytes, you'll see that it returns long. A long will accommodate files larger than 2GB, and larger than 4GB. So in fact, your protocol should at the very least be defined to start with a long rather than an int field.
The size problem really has nothing at all to do with the TCP packets.
i want to read from a network stream and write the bytes to a file, directly.
But every time i run the program very few bytes are written to the file actually.
Java:
InputStream in = uc.getInputStream();
int clength=uc.getContentLength();
byte[] barr = new byte[clength];
int offset=0;
int totalwritten=0;
int i;
int wrote=0;
OutputStream out = new FileOutputStream("file.xlsx");
while(in.available()!=0) {
wrote=in.read(barr, offset, clength-offset);
out.write(barr, offset, wrote);
offset+=wrote;
totalwritten+=wrote;
}
System.out.println("Written: "+totalwritten+" of "+clength);
out.flush();
That's because available() doesn't do what you think it does. Read its API documentation. You should simply read until the number of bytes read, returned by read(), is -1. Or even simpler, use Files.copy():
Files.copy(in, new File("file.xlsx").toPath());
Using a buffer that has the size of the input stream also pretty much defeats the purpose of using a buffer, which is to only have a few bytes in memory.
If you want to reimplement copy(), the general pattern is the following:
byte[] buffer = new byte[4096]; // number of bytes in memory
int numberOfBytesRead;
while ((numberOfBytesRead = in.read(buffer)) >= 0) {
out.write(buffer, 0, numberOfBytesRead);
}
You're using .available() wrong. From Java documentation:
available() returns an estimate of the number of bytes that can be read
(or skipped over) from this input stream without blocking by the next
invocation of a method for this input stream
That means that the first time your stream is slower than your file writing speed (very soon in all probability) the while ends.
You should either prepare a thread that waits for the input until it has read all the expected content length (with a sizable timeout, of course) or just block your program in the wait, if user interaction is not a big deal.
Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.
Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.
Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}
In client side, read code:
byte[] bytes = new byte[50]; //TODO should reuse buffer, for test only
ByteBuffer dst = ByteBuffer.wrap(bytes);
int ret = 0;
int readBytes = 0;
boolean fail = false;
try {
while ((ret = socketChannel.read(dst)) > 0) {
readBytes += ret;
System.out.println("read " + ret + " bytes from socket " + dst);
if (!dst.hasRemaining()) {
break;
}
}
int pos = dst.position();
byte[] data = new byte[pos];
dst.flip();
dst.get(data);
System.out.println("read data: " + StringUtil.toHexString(data));
} catch (Exception e) {
fail = true;
handler.onException(e);
}
The problem is socketChannel.read() always return positive, I checked the return buffer, the data is duplicate N times, it likes the low level socket buffer's position is not move forward. Any idea?
If the server only returned 48 bytes, your code must have blocked in the read() method trying to get the 49th and 50th bytes. So either your '50' is wrong or you will have to restructure your code to read and process whatever you get as you get it rather than trying to fill buffers first. And this can't possibly be the code where you think you always got the same data. The explanation for that would be failure to compact the buffer after the get, if you reuse the same buffer for the next read, which you should do, but your posted code doesn't do.
1 : This might not be a bug !
[assuming that there is readable data in the buffer]...
You would expect a -1 at the end of the stream... See http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/channels/SocketChannel.html#read%28java.nio.ByteBuffer%29
If you are continually recieving a positive value from the read() call, then you will need to determine why data is being read continually.
Of course, the mystery herein ultimately lies in the source data (i.e. the SocketChannel which you are read data from).
2: Explanation of your possible problems
If your socket channel is coming from a REAL file, which is finite then your file is really big, and eventually, the read() operation will return 0... eventually...
If, on the other hand, your socket channel is listening to a source of data which you EXPECT to be finite (i.e. a serialized object stream, for example), I would double check the source --- maybe your finite stream is simply producing more and more data... and you are correctly consuming it.
3: Finally some advice
A trick for debugging this type of error is playing with the ByteBuffer input to your read method : the nice thing about java.nio's ByteBuffers is that, since they are more object oriented then the older byte[] writers, you can get very fine-grained debugging of their operations.
Why does the following method hang?
public void pipe(Reader in, Writer out) {
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
out.append(buf.flip());
}
}
Answering my own question: you have to call buf.clear() between reads. Presumably, read is hanging because the buffer is full. The correct code is
public void pipe(Reader in, Writer out) {
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
out.append(buf.flip());
buf.clear();
}
}
I would assume that it is a deadlock. The in.read(buf) locks the CharBuffer and prevents the out.append(buf) call.
That is assuming that CharBuffer uses locks (of some kind)in the implementation. What does the API say about the class CharBuffer?
Edit: Sorry, some kind of short circuit in my brain... I confused it with something else.
CharBuffers don't work with Readers and Writers as cleanly as you might expect. In particular, there is no Writer.append(CharBuffer buf) method. The method called by the question snippet is Writer.append(CharSequence seq), which just calls seq.toString(). The CharBuffer.toString() method does return the string value of the buffer, but it doesn't drain the buffer. The subsequent call to Reader.read(CharBuffer buf) gets an already full buffer and therefore returns 0, forcing the loop to continue indefinitely.
Though this feels like a hang, it is in fact appending the first read's buffer contents to the writer every pass through the loop. So you'll either start to see a lot of output in your destination or the writer's internal buffer will grow, depending on how the writer is implemented.
As annoying as it is, I'd recommend a char[] implementation if only because the CharBuffer solution winds up building at least two new char[] every pass through the loop.
public void pipe(Reader in, Writer out) throws IOException {
char[] buf = new char[DEFAULT_BUFFER_SIZE];
int count = in.read(buf);
while( count >= 0 ) {
out.write(buf, 0, count);
count = in.read(buf);
}
}
I'd recommend only using this if you need to support converting between two character encodings, otherwise a ByteBuffer/Channel or byte[]/IOStream implementation would be preferable even if you're piping characters.