The class FileInputStream has a method available() that returns the remainging size to be consumed
I'm trying to convert a program that uses FileInputStream to use FileChannel, I konw that we can consume the FileChannel using a ByteBuffer, but what I'm wondering is how would I get the remaining bytes to be consummed from the FileChannel, is there any idea ?
The class FileInputStream has a method available() that returns the remainging size to be consumed
This is not a correct interpretation. available() returns an estimation of the number of bytes that can be read/skipped without the stream blocking. Typically, this is the number of bytes currently buffered by the stream, if any. It does not depict the number of bytes until end-of-stream.
what I'm wondering is how would I get the remaining bytes to be consummed from the FileChannel
Compare FileChannel.position() to FileChannel.size() to see how many bytes remain.
Related
Can you explain one thihg, when a do something like that:
FileInputStream fis1 = new FileInputStream(path1);
FileInputStream fis2 = new FileInputStream(path2);
byte[] array=new byte[fis1.available()+fis2.available()];
And if i want to write bytes to array :
fis2.read(array);
fis1.read(array);
What it will (method read()) do? It will write ALL bytes to array from both streams or no?
How bytes and in what order will be written in the array? Didnt find in spec and docs.
The read(byte[] b) method javadoc says:
Reads up to b.length bytes of data from this input stream into an array of bytes. This method blocks until some input is available.
Returns: the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
What it means is it reads "some" number of bytes into the beginning of the array.
How many bytes does it read? The method returns the number of bytes it read. It reads at most the full length of the array, but it will likely be an amount in the range of a few kilobytes at most. The exact details depend on the operating system and file system implementation.
It does not read all bytes from the file, and it does not guarantee the byte array is filled entirely. If you call it twice, it does not return the same data twice.
I need to convert a stream of char into a stream of bytes, i.e. I need an adapter from a java.io.Writer interface to a java.io.OutputStream, supporting any valid Charset which I will have as a configuration parameter.
However, the java.io.OutputStreamWriter class has a hidden secret: the sun.nio.cs.StreamEncoder object it delegates to underneath creates an 8192 byte (8KB) buffer, even if you don't ask it to.
The problem is, at the OutputStream end I have inserted a wrapper that needs to count the amount of bytes being written, so that it immediately stops execution of the source system once a specific amount of bytes has been output. And if OutputStreamWriter is creating an 8K buffer, I simply get notified of the amount of bytes generated too late because they will only reach my counter when the buffer is flushing (so there will be already more than 8,000 already-generated bytes waiting for me at the OutputStreamWriter buffer).
So the question is, is there anywhere in the Java runtime a Writer -> OutputStream bridge that can run unbuffered?
I would really, really hate to have to write this myself :(...
NOTE: hitting flush() on the OutputStreamWriter for each write is not a valid alternative. That brings a large performance penalty (there's a synchronized block involved at the StreamEncoder).
NOTE 2: I understand it might be necessary to keep a small char overflow at the bridge in order to compute surrogates. It's not that I need to stop the execution of the source system in the very moment it generates the n-th byte (that would not be possible given bytes can come to me in the form of a larger byte[] in a write call). But I need to stop it asap, and waiting for an 8K, 2K or even 200-byte buffer to flush would simply be too late.
As you have already detected the StreamEncoder used by OutputStreamWriter has a buffer size of 8KB and there is no interface to change that size.
But the following snippet gives you a way to obtain a Writer for a OutputStream which internally also uses a StreamEncoder but now has a user-defined buffer size:
String charSetName = ...
CharsetEncoder encoder = Charset.forName(charSetName).newEncoder();
OutputStream out = ...
int bufferSize = ...
WritableByteChannel channel = Channels.newChannel(out);
Writer writer = Channels.newWriter(channel, encoder, bufferSize);
http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read()
The doc says "Reads some number of bytes from the input stream and stores them into the buffer array b.".
How does InputStream read() in Java determine that number of bytes?
The buffer array has a defined length, call it n. The read() method will read between 1 and n bytes. It will block until at least one byte is available, unless EOF is detected.
I think the confusion comes from what "read" means.
read() returns to you the next byte in the InputStream or -1 if there are no more bytes left.
However, due to implementation details of the particular InputStream you are using, the source that contains the bytes being read might have more than one byte read in order to tell you the next byte:
If your InputStream is buffered, then the entire buffer length might be read into memory just to tell you what the next byte is. However, subsequent calls to read() might not need to read the underlying source again until the in memory buffer is exhausted.
If your InputStream is reading a zipped file, then the underlying source may have to have several bytes read in to unzip your data in order to return the next unzipped byte.
Layers of Inputstreams wrapping other inputstreams such asnew GZIPInputStream(new BufferedInputStream(new FileInputStream(file))); will use #1 and #2 above depending on the layer.
I noticed when I use readFully() on a file instead of the read(byte[]), processing time is reduced greatly. However, it occured to me that readFully may be a double edged sword. If I accidentlly try to read in a huge, multi-gigabyte file, it could choke?
Here is a function I am using to generate an SHA-256 checksum:
public static byte[] createChecksum(File log, String type) throws Exception {
DataInputStream fis = new DataInputStream(new FileInputStream(log));
Long len = log.length();
byte[] buffer = new byte[len.intValue()];
fis.readFully(buffer); // TODO: readFully may come at the risk of
// choking on a huge file.
fis.close();
MessageDigest complete = MessageDigest.getInstance(type);
complete.update(buffer);
return complete.digest();
}
If I were to instead use:
DataInputStream fis = new DataInputStream(new BufferedInputStream(new FileInputStream(log)));
Would that allieviate this risk? Or... is the best option (in situations where you can't garuntee data size) to always control the amount of bytes read in and use a loop till all bytes are read?
(Come to think of it, since the MessageDigest API takes in the full byte array at once, I'm not sure how to attain a checksum without stuffing all the data in at once, but I suppose that is another question for another thread.
You should just allocate a decently-sized buffer (65536 bytes perhaps), and do a loop where you read 64kb at a time, using "complete.update()" to append to the digester inside the loop. Be careful on the last block so you only process the number of bytes read (probably less than 64kb)
Reading the file will take as long as it takes, whether you use readFully() or not.
Whether you can actually allocate gigabyte-sized byte arrays is another question. There is no need to use readFully() at all when downloading files. It's for use in wire protocols where say the next 12 bytes are an identifier followed by another 60 bytes of address information and you don't want to have to keep writing loops.
readFully() isn't going to choke if the file is multiple gigabytes, but allocating that byte buffer will. You'll get an out-of-memory exception before you ever get to the call to readFully().
You need to use the method of updating the hash with chunks of the file repeatedly, rather than updating it all at once with the entire file.
Why does this part of my client code is always zero ?
InputStream inputStream = clientSocket.getInputStream();
int readCount = inputStream.available(); // >> IS ALWAYS ZERO
byte[] recvBytes = new byte[readCount];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int n = inputStream.read(recvBytes);
...
Presumably it's because no data has been received yet. available() tries to return the amount of data available right now without blocking, so if you call available() straight after making the connection, I'd expect to receive 0 most of the time. If you wait a while, you may well find available() returns a different value.
However, personally I don't typically use available() anyway. I create a buffer of some appropriate size for the situation, and just read into that:
byte[] data = new byte[16 * 1024];
int bytesRead = stream.read(data);
That will block until some data is available, but it may well return read than 16K of data. If you want to keep reading until you reach the end of the stream, you need to loop round.
Basically it depends on what you're trying to do, but available() is rarely useful in my experience.
From the java docs
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
A subclass' implementation of this method may choose to throw an IOException if this input stream has been closed by invoking the close() method.
The available method for class InputStream always returns 0.
This method should be overridden by subclasses.
Here is a note to understand why it returns 0
In InputStreams, read() calls are said to be "blocking" method calls. That means that if no data is available at the time of the method call, the method will wait for data to be made available.
The available() method tells you how many bytes can be read until the read() call will block the execution flow of your program. On most of the input streams, all call to read() are blocking, that's why available returns 0 by default.
However, on some streams (such as BufferedInputStream, that have an internal buffer), some bytes are read and kept in memory, so you can read them without blocking the program flow. In this case, the available() method tells you how many bytes are kept in the buffer.
According to the documentation, available() only returns the number of bytes that can be read from the stream without blocking. It doesn't mean that a read operation won't return anything.
You should check this value after a delay, to see it increasing.
There are very few correct uses of available(), and this isn't one of them. In this case no data had arrived so it returned zero, which is what it's supposed to do.
Just read until you have what you need. It will block until data is available.