First some background.
Its not needed to answer the actual question, but maybe it'll help put things in perspective.
I have written an mp3 library in java (h) which reads out the information stored in the ID3 tag in an .mp3 file. Information about the song like the name of the song, the CD the song was released on, the track number, etc. are stored in this ID3 tag right at the beginning of an .mp3 file.
I have tested the library on 12,579 mp3 files which are located on my local hard drive, and it works flawlessly. Not a single IO error.
When I perform the same thing where the mp3 files are located on a web server, I get an IO error. Well, not actually an error. Actually its a difference in the behavior of the InputStream's read(byte[]) method.
The example below will illustrate the problem, which occurs when I'm trying to read an image file (.jpg, .gif, .png, etc) from the mp3 file.
// read bytes from an .mp3 file on your local hard drive
// reading from an input stream created this way works flawlessly
InputStream inputStream = new FileInputStream("song.mp3");
// read bytes from an .mp3 file given by a url
// reading from an input stream created this way fails every time.
URL url = "http://localhost/song.mp3");
HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection();
httpConnection.connect();
InputStream inputStream = url.openStream();
int size = 25000; // size of the image file
byte[] buffer = new byte[size];
int numBytesRead = inputStream.read(buffer);
if (numBytesRead != buffer.length)
throw new IOException("Error reading the bytes into the buffer. Expected " + buffer.length + " bytes but got " + numBytesRead + " bytes");
So, my observation is:
Calling inputStream.read(buffer); always reads the entire number of bytes when the input stream is a FileInputStream. But it only reads a partial amount when I am using an input stream obtained from an http connection.
And hence my question is:
In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
That is, have I assumed behavior that is not true of the read(byte[]) method, and I've just gotten lucky working with FileInputStream?
Is the correct, and general behavior of InputStream.read(byte[]) that I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read, or EOF has been reached? Something like the code below:
int size = 25000;
byte[] buffer = new byte[size];
int numBytesRead = 0;
int totalBytesRead = 0;
while (totalBytesRead != size && numBytesRead != -1)
{
numBytesRead = inputStream.read(buffer);
totalBytesRead += numBytesRead
}
Your conclusions are sound, take a look at the documentation for InputStream.read(byte[]):
Reads some number of bytes from the input stream and stores them into
the buffer array b. The number of bytes actually read is returned as
an integer. This method blocks until input data is available, end of
file is detected, or an exception is thrown.
There is no guarantee that read(byte[]) will fill the array you have provided, only that it will either read at least 1 byte (provided your array's length is > 0), or it will return -1 to signal the EOS. This means that if you want to read bytes from an InputStream correctly, you must use a loop.
The loop you currently have has one bug in it. On the first iteration of the loop, you will read a certain number of bytes into your buffer, but on the second iteration you will overwrite some, or all, of those bytes. Take a look at InputStream.read(byte[], int, int).
And hence my question is: In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
No. That's why the documentation says "The number of bytes actually read" and "there is an attempt to read at least one byte."
I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read
Rather than reinvent the wheel, you can get an already-tested wheel at Jakarta Commons IO.
Related
Can you explain one thihg, when a do something like that:
FileInputStream fis1 = new FileInputStream(path1);
FileInputStream fis2 = new FileInputStream(path2);
byte[] array=new byte[fis1.available()+fis2.available()];
And if i want to write bytes to array :
fis2.read(array);
fis1.read(array);
What it will (method read()) do? It will write ALL bytes to array from both streams or no?
How bytes and in what order will be written in the array? Didnt find in spec and docs.
The read(byte[] b) method javadoc says:
Reads up to b.length bytes of data from this input stream into an array of bytes. This method blocks until some input is available.
Returns: the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
What it means is it reads "some" number of bytes into the beginning of the array.
How many bytes does it read? The method returns the number of bytes it read. It reads at most the full length of the array, but it will likely be an amount in the range of a few kilobytes at most. The exact details depend on the operating system and file system implementation.
It does not read all bytes from the file, and it does not guarantee the byte array is filled entirely. If you call it twice, it does not return the same data twice.
First some background.
Its not needed to answer the actual question, but maybe it'll help put things in perspective.
I have written an mp3 library in java (h) which reads out the information stored in the ID3 tag in an .mp3 file. Information about the song like the name of the song, the CD the song was released on, the track number, etc. are stored in this ID3 tag right at the beginning of an .mp3 file.
I have tested the library on 12,579 mp3 files which are located on my local hard drive, and it works flawlessly. Not a single IO error.
When I perform the same thing where the mp3 files are located on a web server, I get an IO error. Well, not actually an error. Actually its a difference in the behavior of the InputStream's read(byte[]) method.
The example below will illustrate the problem, which occurs when I'm trying to read an image file (.jpg, .gif, .png, etc) from the mp3 file.
// read bytes from an .mp3 file on your local hard drive
// reading from an input stream created this way works flawlessly
InputStream inputStream = new FileInputStream("song.mp3");
// read bytes from an .mp3 file given by a url
// reading from an input stream created this way fails every time.
URL url = "http://localhost/song.mp3");
HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection();
httpConnection.connect();
InputStream inputStream = url.openStream();
int size = 25000; // size of the image file
byte[] buffer = new byte[size];
int numBytesRead = inputStream.read(buffer);
if (numBytesRead != buffer.length)
throw new IOException("Error reading the bytes into the buffer. Expected " + buffer.length + " bytes but got " + numBytesRead + " bytes");
So, my observation is:
Calling inputStream.read(buffer); always reads the entire number of bytes when the input stream is a FileInputStream. But it only reads a partial amount when I am using an input stream obtained from an http connection.
And hence my question is:
In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
That is, have I assumed behavior that is not true of the read(byte[]) method, and I've just gotten lucky working with FileInputStream?
Is the correct, and general behavior of InputStream.read(byte[]) that I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read, or EOF has been reached? Something like the code below:
int size = 25000;
byte[] buffer = new byte[size];
int numBytesRead = 0;
int totalBytesRead = 0;
while (totalBytesRead != size && numBytesRead != -1)
{
numBytesRead = inputStream.read(buffer);
totalBytesRead += numBytesRead
}
Your conclusions are sound, take a look at the documentation for InputStream.read(byte[]):
Reads some number of bytes from the input stream and stores them into
the buffer array b. The number of bytes actually read is returned as
an integer. This method blocks until input data is available, end of
file is detected, or an exception is thrown.
There is no guarantee that read(byte[]) will fill the array you have provided, only that it will either read at least 1 byte (provided your array's length is > 0), or it will return -1 to signal the EOS. This means that if you want to read bytes from an InputStream correctly, you must use a loop.
The loop you currently have has one bug in it. On the first iteration of the loop, you will read a certain number of bytes into your buffer, but on the second iteration you will overwrite some, or all, of those bytes. Take a look at InputStream.read(byte[], int, int).
And hence my question is: In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
No. That's why the documentation says "The number of bytes actually read" and "there is an attempt to read at least one byte."
I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read
Rather than reinvent the wheel, you can get an already-tested wheel at Jakarta Commons IO.
http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read()
The doc says "Reads some number of bytes from the input stream and stores them into the buffer array b.".
How does InputStream read() in Java determine that number of bytes?
The buffer array has a defined length, call it n. The read() method will read between 1 and n bytes. It will block until at least one byte is available, unless EOF is detected.
I think the confusion comes from what "read" means.
read() returns to you the next byte in the InputStream or -1 if there are no more bytes left.
However, due to implementation details of the particular InputStream you are using, the source that contains the bytes being read might have more than one byte read in order to tell you the next byte:
If your InputStream is buffered, then the entire buffer length might be read into memory just to tell you what the next byte is. However, subsequent calls to read() might not need to read the underlying source again until the in memory buffer is exhausted.
If your InputStream is reading a zipped file, then the underlying source may have to have several bytes read in to unzip your data in order to return the next unzipped byte.
Layers of Inputstreams wrapping other inputstreams such asnew GZIPInputStream(new BufferedInputStream(new FileInputStream(file))); will use #1 and #2 above depending on the layer.
I am using java to read a TSV file that is 4gb in size and i wanted to know if there is a way for java to tell me how far it is through the task as the program is running. I'm thinking file stream might be able to tell me how many bytes it has read and i could do some simple math with that.
A plain stream or reader doesn't count the number of bytes / characters read.
I think you might be looking for ProgressMonitorInputStream.
If you don't want / need the Swing integration, then another alternative is
to write a custom subclass of FilterReader or FilterInputStream that counts the characters/bytes read and provides a getter for reading the count. Then put the custom class into your input stack at the appropriate point.
As you read from the stream, keep a tally of bytes read. For example, if you are reading byte arrays directly from the stream:
long bytesReadTotal = 0L;
int bytesRead = stream.read(bytes);
while (bytesRead != -1) {
bytesReadTotal += bytesRead;
// process these bytes ...
bytesRead = stream.read(bytes)
}
If you read this file through HTTP, there is a header named "Content-Length" can tell you the total number of bytes you should read, then you know the progress while you are reading.
If you read the file through TCP/UDP, I guess you should write both the client and the server for file transferring, then you should send the file length first to the client, then read the file.
If you just read a local file, this is not a problem.
For good or bad I have been using code like the following without any problems:
ZipFile aZipFile = new ZipFile(fileName);
InputStream zipInput = aZipFile.getInputStream(name);
int theSize = zipInput.available();
byte[] content = new byte[theSize];
zipInput.read(content, 0, theSize);
I have used it (this logic of obtaining the available size and reading directly to a byte buffer)
for File I/O without any issues and I used it with zip files as well.
But recently I stepped into a case that the zipInput.read(content, 0, theSize); actually reads 3 bytes less that the theSize available.
And since the code is not in a loop to check the length returned by zipInput.read(content, 0, theSize); I read the file with the 3 last bytes missing
and later the program can not function properly (the file is a binary file).
Strange enough with different zip files of larger size e.g. 1075 bytes (in my case the problematic zip entry is 867 bytes) the code works fine!
I understand that the logic of the code is probably not the "best" but why am I suddenly getting this problem now?
And how come if I run the program immediately with a larger zip entry it works?
Any input is highly welcome
Thanks
From the InputStream read API docs:
An attempt is made to read as many as len bytes, but a smaller number
may be read.
... and:
Returns: the total number of bytes read into the buffer, or -1 if
there is no more data because the end of the stream has been reached.
In other words unless the read method returns -1 there is still more data available to read, but you cannot guarantee that read will read exactly the specified number of bytes. The specified number of bytes is the upper bound describing the maximum amount of data it will read.
Using available() does not guarantee that it counted total available bytes to the end of stream.
Refer to Java InputStream's available() method. It says that
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
An example solution for your problem can be as follows:
ZipFile aZipFile = new ZipFile(fileName);
InputStream zipInput = aZipFile.getInputStream( caImport );
int available = zipInput.available();
byte[] contentBytes = new byte[ available ];
while ( available != 0 )
{
zipInput.read( contentBytes );
// here, do what ever you want
available = dis.available();
} // while available
...
This works for sure on all sizes of input files.
The best way to do this should be as bellows:
public static byte[] readZipFileToByteArray(ZipFile zipFile, ZipEntry entry)
throws IOException {
InputStream in = null;
try {
in = zipFile.getInputStream(entry);
return IOUtils.toByteArray(in);
} finally {
IOUtils.closeQuietly(in);
}
}
where the IOUtils.toByteArray(in) method keeps reading until EOF and then return the byte array.