Java - download a file through network with a buffer - java

i want to read from a network stream and write the bytes to a file, directly.
But every time i run the program very few bytes are written to the file actually.
Java:
InputStream in = uc.getInputStream();
int clength=uc.getContentLength();
byte[] barr = new byte[clength];
int offset=0;
int totalwritten=0;
int i;
int wrote=0;
OutputStream out = new FileOutputStream("file.xlsx");
while(in.available()!=0) {
wrote=in.read(barr, offset, clength-offset);
out.write(barr, offset, wrote);
offset+=wrote;
totalwritten+=wrote;
}
System.out.println("Written: "+totalwritten+" of "+clength);
out.flush();

That's because available() doesn't do what you think it does. Read its API documentation. You should simply read until the number of bytes read, returned by read(), is -1. Or even simpler, use Files.copy():
Files.copy(in, new File("file.xlsx").toPath());
Using a buffer that has the size of the input stream also pretty much defeats the purpose of using a buffer, which is to only have a few bytes in memory.
If you want to reimplement copy(), the general pattern is the following:
byte[] buffer = new byte[4096]; // number of bytes in memory
int numberOfBytesRead;
while ((numberOfBytesRead = in.read(buffer)) >= 0) {
out.write(buffer, 0, numberOfBytesRead);
}

You're using .available() wrong. From Java documentation:
available() returns an estimate of the number of bytes that can be read
(or skipped over) from this input stream without blocking by the next
invocation of a method for this input stream
That means that the first time your stream is slower than your file writing speed (very soon in all probability) the while ends.
You should either prepare a thread that waits for the input until it has read all the expected content length (with a sizable timeout, of course) or just block your program in the wait, if user interaction is not a big deal.

Related

How Buffer Streams works internally in Java

I'm reading about Buffer Streams. I searched about it and found many answers that clear my concepts but still have little more questions.
After searching, I have come to know that, Buffer is temporary memory(RAM) which helps program to read data quickly instead hard disk. and when Buffers empty then native input API is called.
After reading little more I got answer from here that is.
Reading data from disk byte-by-byte is very inefficient. One way to
speed it up is to use a buffer: instead of reading one byte at a time,
you read a few thousand bytes at once, and put them in a buffer, in
memory. Then you can look at the bytes in the buffer one by one.
I have two confusion,
1: How/Who data filled in Buffers? (native API how?) as quote above, who filled thousand bytes at once? and it will consume same time. Suppose I have 5MB data, and 5MB loaded once in Buffer in 5 Seconds. and then program use this data from buffer in 5 seconds. Total 10 seconds. But if I skip buffering, then program get direct data from hard disk in 1MB/2sec same as 10Sec total. Please clear my this confusion.
2: The second one how this line works
BufferedReader inputStream = new BufferedReader(new FileReader("xanadu.txt"));
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
Thanks.
As for the performance of using buffering during read/write, it's probably minimal in impact since the OS will cache too, however buffering will reduce the number of calls to the OS, which will have an impact.
When you add other operations on top, such as character encoding/decoding or compression/decompression, the impact is greater as those operations are more efficient when done in blocks.
You second question said:
As I'm thinking FileReader write data to buffer, then BufferedReader read data from buffer memory? Also explain this.
I believe your thinking is wrong. Yes, technically the FileReader will write data to a buffer, but the buffer is not defined by the FileReader, it's defined by the caller of the FileReader.read(buffer) method.
The operation is initiated from outside, when some code calls BufferedReader.read() (any of the overloads). BufferedReader will then check it's buffer, and if enough data is available in the buffer, it will return the data without involving the FileReader. If more data is needed, the BufferedReader will call the FileReader.read(buffer) method to get the next chunk of data.
It's a pull operation, not a push, meaning the data is pulled out of the readers by the caller.
All the stuff is done by a private method named fill() i give you for educational purpose, but all java IDE let you see the source code yourself :
private void fill() throws IOException {
int dst;
if (markedChar <= UNMARKED) {
/* No mark */
dst = 0;
} else {
/* Marked */
int delta = nextChar - markedChar;
if (delta >= readAheadLimit) {
/* Gone past read-ahead limit: Invalidate mark */
markedChar = INVALIDATED;
readAheadLimit = 0;
dst = 0;
} else {
if (readAheadLimit <= cb.length) {
/* Shuffle in the current buffer */
// here copy the read chars in a memory buffer named cb
System.arraycopy(cb, markedChar, cb, 0, delta);
markedChar = 0;
dst = delta;
} else {
/* Reallocate buffer to accommodate read-ahead limit */
char ncb[] = new char[readAheadLimit];
System.arraycopy(cb, markedChar, ncb, 0, delta);
cb = ncb;
markedChar = 0;
dst = delta;
}
nextChar = nChars = delta;
}
}
int n;
do {
n = in.read(cb, dst, cb.length - dst);
} while (n == 0);
if (n > 0) {
nChars = dst + n;
nextChar = dst;
}
}

Java TCP-Sockets transmit files larger than 4gb [duplicate]

This question already has answers here:
Java multiple file transfer over socket
(3 answers)
Closed 5 years ago.
I am trying to transfer a file that is greater than 4gb using the Java SocketsAPI. I am already reading it via InputStreams and writing it via OutputStreams. However, analyzing the transmitted packets in Wireshark, I realise that the Sequence number of the TCP-packets is incremented by the byte-length of the packet, which seems to be 1440byte.
This leads to the behavior that when I try to send a file greater than 4gb, the total size of the Sequence-Number field of TCP is exceeded, leading to lots of error packages, but no error in Java.
My code for transmission currently looks like this:
DataOutputStream fileTransmissionStream = new DataOutputStream(transmissionSocket.getOutputStream());
FileInputStream fis = new FileInputStream(toBeSent);
int totalFileSize = fis.available();
fileTransmissionStream.writeInt(totalFileSize);
while (totalFileSize >0){
if(totalFileSize >= FileTransmissionManagementService.splittedTransmissionSize){
sendBytes = new byte[FileTransmissionManagementService.splittedTransmissionSize];
fis.read(sendBytes);
totalFileSize -= FileTransmissionManagementService.splittedTransmissionSize;
} else {
sendBytes = new byte[totalFileSize];
fis.read(sendBytes);
totalFileSize = 0;
}
byte[] encryptedBytes = DataEncryptor.encrypt(sendBytes);
/*byte[] bytesx = ByteBuffer.allocate(4).putInt(encryptedBytes.length).array();
fileTransmissionStream.write(bytesx,0,4);*/
fileTransmissionStream.writeInt(encryptedBytes.length);
fileTransmissionStream.write(encryptedBytes, 0, encryptedBytes.length);
What exactly have I done wrong in this situation, or is it not possible to transmit files greater than 4gb via one Socket?
TCP can handle infinitely long data streams. There is no problem with the sequence number wrapping around. As it is initially random, that can happen almost immediately, regardless of the length of the stream. The problems are in your code:
DataOutputStream fileTransmissionStream = new DataOutputStream(transmissionSocket.getOutputStream());
FileInputStream fis = new FileInputStream(toBeSent);
int totalFileSize = fis.available();
Classic misuse of available(). Have a look at the Javadoc and see what it's really for. This is also where your basic problem lies, as values > 2G don't fit into an int, so there is a truncation. You should be using File.length(), and storing it into a long.
fileTransmissionStream.writeInt(totalFileSize);
while (totalFileSize >0){
if(totalFileSize >= FileTransmissionManagementService.splittedTransmissionSize){
sendBytes = new byte[FileTransmissionManagementService.splittedTransmissionSize];
fis.read(sendBytes);
Here you are ignoring the result of read() here. It isn't guaranteed to fill the buffer: that's why it returns a value. See, again, the Javadoc.
totalFileSize -= FileTransmissionManagementService.splittedTransmissionSize;
} else {
sendBytes = new byte[totalFileSize];
Here you are assuming the file size fits into an int, and assuming the bytes fit into memory.
fis.read(sendBytes);
See above re read().
totalFileSize = 0;
}
byte[] encryptedBytes = DataEncryptor.encrypt(sendBytes);
/*byte[] bytesx = ByteBuffer.allocate(4).putInt(encryptedBytes.length).array();
fileTransmissionStream.write(bytesx,0,4);*/
We're not interested in your commented-out code.
fileTransmissionStream.writeInt(encryptedBytes.length);
fileTransmissionStream.write(encryptedBytes, 0, encryptedBytes.length);
You don't need all this crud. Use a CipherOutputStream to take care of the encryption, or better still SSL, and use the following copy loop:
byte[] buffer = new byte[8192]; // or much more if you like, but there are diminishing returns
int count;
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
It seems that your protocol for the transmission is:
Send total file length in an int.
For each bunch of bytes read,
Send number of encrypted bytes ahead in an int,
Send the entrypted bytes themselves.
The basic problem, beyond the misinterpretations of the documentation that were pointed out in #EJP's answer, is with this very protocol.
You assume that the file length can be sent oven in an int. This means the length it sends cannot be more than Integer.MAX_VALUE. Of course, this limits you to files of 2G length (remember Java integers are signed).
If you take a look at the Files.size() method, which is a method for getting the actual file size in bytes, you'll see that it returns long. A long will accommodate files larger than 2GB, and larger than 4GB. So in fact, your protocol should at the very least be defined to start with a long rather than an int field.
The size problem really has nothing at all to do with the TCP packets.

Why doesn't InputStream fill the array fully?

Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.
Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.
Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}

how to load first x bytes from URL with Java / Scala?

I want to read the first x bytes from a java.net.URLConnection (although I'm not forced to use this class - other suggestions welcome).
My code looks like this:
val head = new Array[Byte](2000)
new BufferedInputStream(connection.getInputStream).read(head)
IOUtils.toString(new ByteArrayInputStream(head), charset)
It works, but does this code load only the first 2000 bytes from the network?
Next trial
As 'JB Nizet' said it is not useful to use a buffered input stream, so I tried it with an InputStreamReader:
val head = new Array[Char](2000)
new InputStreamReader(connection.getInputStream, charset).read(head)
new String(head)
This code may be better, but the load times are about the same. So does this procedure limit the transferred bytes ?
No, it doesn't. It could read up to 8192 bytes (the deault buffer size of BufferedInputStream). It could also read 0 bytes, or any number of bytes between 0 and 2000, since you don't check the number of bytes that have actually been read, and which is returned by the read() method.
And finally, depending on the value of charset, and of the actual charset used by the HTTP response, this could return an incorrect string, or a String truncated in the middle of a multi-byte character. You should use a Reader to read text.
I suggest you read the Java IO tutorial.
You can use read(Reader, char[]) from Apache Commons IO. Just pass a 2000-character buffer to it and it will fill it with as many characters as possible, up to 2000.
Be sure you understand the objections in the other answers/comments, in particular:
Don't use Buffered... wrappers, it goes against your intentions.
If you read textual data, then use a Reader to read 2000 characters instead of InputStream reading 2000 bytes. The proper procedure would be to determine the character encoding from the headers of a response (Content-Type) and set that encoding into InputStreamReader.
Calling plain read(char[]) on a Reader will not fully fill the array you give to it. It can read as little as one character no matter how big the array is!
Don't forget to close the reader afterwards.
Other than that, I'd strongly recommend you to use Apache HttpClient in favor of java.net.URLConnection. It's much more flexible.
Edit: To understand the difference between Reader.read and IOUtils.read, it's worth examining the source of the latter:
public static int read(Reader input, char[] buffer,
int offset, int length)
throws IOException
{
if (length < 0) {
throw new IllegalArgumentException("Length must not be negative: " + length);
}
int remaining = length;
while (remaining > 0) {
int location = length - remaining;
int count = input.read(buffer, offset + location, remaining);
if (EOF == count) { // EOF
break;
}
remaining -= count;
}
return length - remaining;
}
Since Reader.read can read less characters than a given length (we only know it's at least 1 and at most the length), we need to iterate calling it until we get the amount we want.

DataInputStream.read returning less than len

I am using DataInputStream to read some bytes from a socket. I have a expected number of bytes to read from the stream (after decoding a header, I know how many bytes are in the message) It works 99% of the time but occasionally I will have the number of bytes read be less than len.
int numRead = dis.read(buffer, 0, len);
What could cause numRead to be less than len? It's not -1. I would expect the behavior of read to block until the stream is closed or EOF is reached, but if it's a socket underlying the streams this shouldn't happen unless the socket closes, right?
Is there a way of reading bytes from a socket that will always ensure that you read len bytes?
Thanks
EDIT: For a general stream, you just keep reading until you've read everything you want to, basically. For an implementation of DataInput (such as DataInputStream) you should use readFully, as suggested by Peter Lawrey. Consider the rest of this answer to be relevant for the general case where you just have an InputStream.
It's entirely reasonable for an InputStream of any type to give you less data than you asked for, even if more is on its way. You should always code for this possibility - with the possible exception of ByteArrayInputStream. (That can still return less data than was requested, of course, if there's less data left than you asked for.)
Here's the sort of loop I'm talking about:
byte[] data = new byte[messageSize];
int totalRead = 0;
while (totalRead < messageSize) {
int bytesRead = stream.read(data, totalRead, messageSize - totalRead);
if (bytesRead < 0) {
// Change behaviour if this isn't an error condition
throw new IOException("Data stream ended prematurely");
}
totalRead += bytesRead;
}
You can use DataInputStream this way.
byte[] bytes = new byte[len];
dis.readFully(bytes);
This will either return with all the data read or throw an IOException.
read returns each time with the bits that were available at that time and -1 when done, you are typically supposed to do
while (true) {
int numRead = dis.read(buffer, 0, len);
if (numRead == -1) break;
total.append(buffer, numRead);
}
I would expect the behavior of read to
block until the stream is closed or
EOF is reached.
Then you need to check the Javadocs. The contract of read() is that it will read at least one byte, blocking if necessary until it has done so, or until EOS or an exception occurs. There is nothing in the specification that says it will read the entire length you requested. That's why it returns a length.
Is there a way of reading bytes from a socket that will always ensure that you read len bytes?
You can use Apache Commons IOUtils, they have a method that does exactly what you need:
byte[] buf = new byte[BUFFER_SIZE];
int length;
do {
length = IOUtils.read(inputStream, buf);
if (length > 0) {
//do something with buf
}
} while (length == BUFFER_SIZE);

Categories

Resources