Can you explain one thihg, when a do something like that:
FileInputStream fis1 = new FileInputStream(path1);
FileInputStream fis2 = new FileInputStream(path2);
byte[] array=new byte[fis1.available()+fis2.available()];
And if i want to write bytes to array :
fis2.read(array);
fis1.read(array);
What it will (method read()) do? It will write ALL bytes to array from both streams or no?
How bytes and in what order will be written in the array? Didnt find in spec and docs.
The read(byte[] b) method javadoc says:
Reads up to b.length bytes of data from this input stream into an array of bytes. This method blocks until some input is available.
Returns: the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
What it means is it reads "some" number of bytes into the beginning of the array.
How many bytes does it read? The method returns the number of bytes it read. It reads at most the full length of the array, but it will likely be an amount in the range of a few kilobytes at most. The exact details depend on the operating system and file system implementation.
It does not read all bytes from the file, and it does not guarantee the byte array is filled entirely. If you call it twice, it does not return the same data twice.
Related
First some background.
Its not needed to answer the actual question, but maybe it'll help put things in perspective.
I have written an mp3 library in java (h) which reads out the information stored in the ID3 tag in an .mp3 file. Information about the song like the name of the song, the CD the song was released on, the track number, etc. are stored in this ID3 tag right at the beginning of an .mp3 file.
I have tested the library on 12,579 mp3 files which are located on my local hard drive, and it works flawlessly. Not a single IO error.
When I perform the same thing where the mp3 files are located on a web server, I get an IO error. Well, not actually an error. Actually its a difference in the behavior of the InputStream's read(byte[]) method.
The example below will illustrate the problem, which occurs when I'm trying to read an image file (.jpg, .gif, .png, etc) from the mp3 file.
// read bytes from an .mp3 file on your local hard drive
// reading from an input stream created this way works flawlessly
InputStream inputStream = new FileInputStream("song.mp3");
// read bytes from an .mp3 file given by a url
// reading from an input stream created this way fails every time.
URL url = "http://localhost/song.mp3");
HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection();
httpConnection.connect();
InputStream inputStream = url.openStream();
int size = 25000; // size of the image file
byte[] buffer = new byte[size];
int numBytesRead = inputStream.read(buffer);
if (numBytesRead != buffer.length)
throw new IOException("Error reading the bytes into the buffer. Expected " + buffer.length + " bytes but got " + numBytesRead + " bytes");
So, my observation is:
Calling inputStream.read(buffer); always reads the entire number of bytes when the input stream is a FileInputStream. But it only reads a partial amount when I am using an input stream obtained from an http connection.
And hence my question is:
In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
That is, have I assumed behavior that is not true of the read(byte[]) method, and I've just gotten lucky working with FileInputStream?
Is the correct, and general behavior of InputStream.read(byte[]) that I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read, or EOF has been reached? Something like the code below:
int size = 25000;
byte[] buffer = new byte[size];
int numBytesRead = 0;
int totalBytesRead = 0;
while (totalBytesRead != size && numBytesRead != -1)
{
numBytesRead = inputStream.read(buffer);
totalBytesRead += numBytesRead
}
Your conclusions are sound, take a look at the documentation for InputStream.read(byte[]):
Reads some number of bytes from the input stream and stores them into
the buffer array b. The number of bytes actually read is returned as
an integer. This method blocks until input data is available, end of
file is detected, or an exception is thrown.
There is no guarantee that read(byte[]) will fill the array you have provided, only that it will either read at least 1 byte (provided your array's length is > 0), or it will return -1 to signal the EOS. This means that if you want to read bytes from an InputStream correctly, you must use a loop.
The loop you currently have has one bug in it. On the first iteration of the loop, you will read a certain number of bytes into your buffer, but on the second iteration you will overwrite some, or all, of those bytes. Take a look at InputStream.read(byte[], int, int).
And hence my question is: In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
No. That's why the documentation says "The number of bytes actually read" and "there is an attempt to read at least one byte."
I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read
Rather than reinvent the wheel, you can get an already-tested wheel at Jakarta Commons IO.
As everyone already knows, Java allocates the size of array upon declaration. meaning that if I define array like this:
byte[] buffer = new byte[10];
the buffer.length will return 10. However, my question is, how can I find out how many elements are populated. I am populating elements from an InputStream so I never know how many elements will be there.
I know that allocated array positions which are empty will be 0 but what if I am getting zero values from the InputStream as well, mixed together with normal values?
You can't. Just store the amount of bytes you have read in a variable. InputStream.read() returns the number of bytes read:
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
You could pre-populate the array with some invalid value that is not expected to be produced by the InputStream. However, this depends on the InputStreams content, so picking a generic default value is not really easy.
Another option for you is to use some collection, like List<Byte> or Set<Byte> (if you need uniqueness) - in this case the number of populated elements will simply be the size of the collection and if you really need an array at the end, you could easily produce one from the content of the collection.
Lets look at this piece of code:
InputStream inputStream = new FileInputStream(new File("fileName"));
int cntReadBytes = 0;
byte[] bytes = new byte[10];
while((i = inputStream.read()) != -1) {
bytes[cntReadBytes] = i;
cntReadBytes++;
}
System.out.println("You have read total: " + cntReadBytes + " bytes" );
as you can see in the while loop condition I make comparison of inputstream return and -1, when the return value is different it means I have more data to read, when it reaches the end of file it will return -1. So the value of cntReadBytes will be the number of iterations, i.e. successfully read bytes.
Or if you are free to use List, you don't have to worry about this.
You could count the number of bytes passed through the InputStream, e.g. by using Apache Commons CountingInputStream. Or you could extend FilterInputStream yourself, if you don't have Apache Commons.
http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read()
The doc says "Reads some number of bytes from the input stream and stores them into the buffer array b.".
How does InputStream read() in Java determine that number of bytes?
The buffer array has a defined length, call it n. The read() method will read between 1 and n bytes. It will block until at least one byte is available, unless EOF is detected.
I think the confusion comes from what "read" means.
read() returns to you the next byte in the InputStream or -1 if there are no more bytes left.
However, due to implementation details of the particular InputStream you are using, the source that contains the bytes being read might have more than one byte read in order to tell you the next byte:
If your InputStream is buffered, then the entire buffer length might be read into memory just to tell you what the next byte is. However, subsequent calls to read() might not need to read the underlying source again until the in memory buffer is exhausted.
If your InputStream is reading a zipped file, then the underlying source may have to have several bytes read in to unzip your data in order to return the next unzipped byte.
Layers of Inputstreams wrapping other inputstreams such asnew GZIPInputStream(new BufferedInputStream(new FileInputStream(file))); will use #1 and #2 above depending on the layer.
First some background.
Its not needed to answer the actual question, but maybe it'll help put things in perspective.
I have written an mp3 library in java (h) which reads out the information stored in the ID3 tag in an .mp3 file. Information about the song like the name of the song, the CD the song was released on, the track number, etc. are stored in this ID3 tag right at the beginning of an .mp3 file.
I have tested the library on 12,579 mp3 files which are located on my local hard drive, and it works flawlessly. Not a single IO error.
When I perform the same thing where the mp3 files are located on a web server, I get an IO error. Well, not actually an error. Actually its a difference in the behavior of the InputStream's read(byte[]) method.
The example below will illustrate the problem, which occurs when I'm trying to read an image file (.jpg, .gif, .png, etc) from the mp3 file.
// read bytes from an .mp3 file on your local hard drive
// reading from an input stream created this way works flawlessly
InputStream inputStream = new FileInputStream("song.mp3");
// read bytes from an .mp3 file given by a url
// reading from an input stream created this way fails every time.
URL url = "http://localhost/song.mp3");
HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection();
httpConnection.connect();
InputStream inputStream = url.openStream();
int size = 25000; // size of the image file
byte[] buffer = new byte[size];
int numBytesRead = inputStream.read(buffer);
if (numBytesRead != buffer.length)
throw new IOException("Error reading the bytes into the buffer. Expected " + buffer.length + " bytes but got " + numBytesRead + " bytes");
So, my observation is:
Calling inputStream.read(buffer); always reads the entire number of bytes when the input stream is a FileInputStream. But it only reads a partial amount when I am using an input stream obtained from an http connection.
And hence my question is:
In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
That is, have I assumed behavior that is not true of the read(byte[]) method, and I've just gotten lucky working with FileInputStream?
Is the correct, and general behavior of InputStream.read(byte[]) that I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read, or EOF has been reached? Something like the code below:
int size = 25000;
byte[] buffer = new byte[size];
int numBytesRead = 0;
int totalBytesRead = 0;
while (totalBytesRead != size && numBytesRead != -1)
{
numBytesRead = inputStream.read(buffer);
totalBytesRead += numBytesRead
}
Your conclusions are sound, take a look at the documentation for InputStream.read(byte[]):
Reads some number of bytes from the input stream and stores them into
the buffer array b. The number of bytes actually read is returned as
an integer. This method blocks until input data is available, end of
file is detected, or an exception is thrown.
There is no guarantee that read(byte[]) will fill the array you have provided, only that it will either read at least 1 byte (provided your array's length is > 0), or it will return -1 to signal the EOS. This means that if you want to read bytes from an InputStream correctly, you must use a loop.
The loop you currently have has one bug in it. On the first iteration of the loop, you will read a certain number of bytes into your buffer, but on the second iteration you will overwrite some, or all, of those bytes. Take a look at InputStream.read(byte[], int, int).
And hence my question is: In general, can I not assume that the InputStream's read(byte[]) method will block until the entire number of bytes has been read (or EOF is reached)?
No. That's why the documentation says "The number of bytes actually read" and "there is an attempt to read at least one byte."
I need to put the call in a loop and keep reading bytes until the desired number of bytes have been read
Rather than reinvent the wheel, you can get an already-tested wheel at Jakarta Commons IO.
For good or bad I have been using code like the following without any problems:
ZipFile aZipFile = new ZipFile(fileName);
InputStream zipInput = aZipFile.getInputStream(name);
int theSize = zipInput.available();
byte[] content = new byte[theSize];
zipInput.read(content, 0, theSize);
I have used it (this logic of obtaining the available size and reading directly to a byte buffer)
for File I/O without any issues and I used it with zip files as well.
But recently I stepped into a case that the zipInput.read(content, 0, theSize); actually reads 3 bytes less that the theSize available.
And since the code is not in a loop to check the length returned by zipInput.read(content, 0, theSize); I read the file with the 3 last bytes missing
and later the program can not function properly (the file is a binary file).
Strange enough with different zip files of larger size e.g. 1075 bytes (in my case the problematic zip entry is 867 bytes) the code works fine!
I understand that the logic of the code is probably not the "best" but why am I suddenly getting this problem now?
And how come if I run the program immediately with a larger zip entry it works?
Any input is highly welcome
Thanks
From the InputStream read API docs:
An attempt is made to read as many as len bytes, but a smaller number
may be read.
... and:
Returns: the total number of bytes read into the buffer, or -1 if
there is no more data because the end of the stream has been reached.
In other words unless the read method returns -1 there is still more data available to read, but you cannot guarantee that read will read exactly the specified number of bytes. The specified number of bytes is the upper bound describing the maximum amount of data it will read.
Using available() does not guarantee that it counted total available bytes to the end of stream.
Refer to Java InputStream's available() method. It says that
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
An example solution for your problem can be as follows:
ZipFile aZipFile = new ZipFile(fileName);
InputStream zipInput = aZipFile.getInputStream( caImport );
int available = zipInput.available();
byte[] contentBytes = new byte[ available ];
while ( available != 0 )
{
zipInput.read( contentBytes );
// here, do what ever you want
available = dis.available();
} // while available
...
This works for sure on all sizes of input files.
The best way to do this should be as bellows:
public static byte[] readZipFileToByteArray(ZipFile zipFile, ZipEntry entry)
throws IOException {
InputStream in = null;
try {
in = zipFile.getInputStream(entry);
return IOUtils.toByteArray(in);
} finally {
IOUtils.closeQuietly(in);
}
}
where the IOUtils.toByteArray(in) method keeps reading until EOF and then return the byte array.