I have some questions on java.nio.Buffer. Basically, my question starts with if a flip() call is always needed to switch between read and write, or is it only needed for slow I/O, e.g. in the case of write then read, to ensure the data are completely written before they are read. My particular question is with mappedByteBuffer. It looks like if the file exists and is of the size that I know, I can just use the position(int newPosition) call to navigate to any part of the file, and perform read or write, i.e. basically use the buffer as a chunk of memory forgetting about the concepts of mark or limit. Is this true?
Consider the following example. If I have a file that contains integer 1, then 2 from the beginning, it seems I can put another integer 3 at position 0, the rewind and read 3 and 2 out from the buffer. Shouldn't the limit stop me from the second getInt like in a normal non-mmap buffer? When do I ever need to call flip() to switch between a write and read of a mappedByteBuffer? Thanks!
final int FILESIZE = 1024;
RandomAccessFile fileHandle;
FileChannel fileChannel;
File testFile = new File("c:/temp/testbbrw.dat");
fileHandle = new RandomAccessFile(testFile, "rw");
fileChannel = fileHandle.getChannel();
MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, FILESIZE);
int pos, data;
mbb.position(0);
mbb.putInt(3);
mbb.position(0);
data=mbb.getInt(); //I get 3
data=mbb.getInt(); //I get 2, which was written to the file before this program runs
mbb.force();
fileHandle.close();
This is what Buffer.flip does
347 public final Buffer flip() {
348 limit = position;
349 position = 0;
350 mark = -1;
351 return this;
352 }
It is preparing the buffer, so that the next read operations on the buffer start at position 0 and end at the current limit. means you tell it, that you are done with changing the buffer and ready to move or copy it somewhere else (which means reading it)
my question starts with if a flip() call is always needed to switch between read and write, or is it only needed for slow I/O, e.g. in the case of write then read, to ensure the data are completely written before they are read.
A Buffer of any description starts out in a state where you can read into it, or put to it, which is the same thing.
flip() puts it into a state where you can write from it, or get from it, which is the same thing.
Despite its (extremely stupid) name, flip() is not the inverse of flip(). Its only inverses are compact() and clear().
For clarity I find it best to always leave a Buffer in the readable state, and only flip it into the writable state when I need to, and get it back to readable immediately afterwards.
This is for I/O.
If you are doing just get() and put() I'm not sure I would use flip() at all, and as this is a MappedByteBuffer I certainly wouldn't be calling clear() or compact(), both of which could do terrible things to the file, and which also rules out using flip().
The design of the Buffer API in Java is confusing and counter-intuitive compared to a typical cyclic, finite buffer. Making things worse is poor choice of terms in the documentation, compounded by equivocal uses of read/write and put/get terms, the former referring to an external operation (typically by a Channel) using a Buffer, and the latter to operations provided by a Buffer.
Java Buffers
On creation, a new buffer is "empty", ready to be filled. It may be immediately supplied with some content in the constructor, but it remains in the "filling" state.
The flip() method "flips" the logical state of the buffer from being filled to being emptied. Rather idiotically flip() does not reverse itself, even though in normal English it would usually describe a logically reversible action. Indeed, looking at the code, calling it twice without an intervening clear or compact sets the buffer into an invalid state, causing other methods to return nonsense. [1]
The clear() and compact() methods are the logical inverse of flip(), restoring the buffer to a "filling" state, with the former also emptying it and the latter maintaining the remaining content.
The general recommendation is to keep any given buffer in a consistent state always, with a try/finally; for example:
ByteBuffer wrap(ByteBuffer src, ByteBuffer tgt) {
// assume buffers are *always* kept in the "filling" state
try {
src.flip(); // change `src` to "emptying"; assume tgt already filling
// transfer some or all of `src` to `tgt`
}
finally {
if(src.remaining()) { src.compact(); } // revert `src` to "filling" without discarding remaining data
else { src.clear(); } // compact() is (usually) less efficient than clearing
}
Typical Finite Cyclic Buffers
Where Java Buffers are most counter-intuitive is due to the fact that most cyclic-buffer implementations are concurrently read/write capable, such that they maintain three values, head, tail and capacity (the last often inferred from the backing array if the language permits it) and they simply permit the values to wrap. The head is where data is read from, and the tail is where it is written to. When the end of the underlying array is reached the value of head/tail is simply set to zero (i.e. it cycles around).
When head == tail, the buffer is empty. When inc(tail) == head, the buffer is full, and the current content length is arrived at by head <= tail ? (tail - head) : (capacity - head + tail). The size of the backing array is generally capacity+1 such that the the tail index is not equal to head when the buffer is full (which would be ambiguous without a separate flag).
This makes the internal index handling slightly more complex, but for the benefit not having to flip-flop state and never needing to "compact" the data back to the start of the internal array (though most implementations will reset the start/end indices to zero whenever the buffer is emptied).
Typically, this also translates to a trade-off when reading in that two array copies may be needed; first from head to the end of the array and then from beginning of the array to tail. Three copy operations may be needed when the target is also a buffer and wraps during write (but the extra copy is hidden within the put method).
Suppositions
My best guess is that Java defines Buffers in this manner so that all reads and writes to the buffer occur in contiguous blocks. This, presumably, enables downstream/internal optimizations when dealing with things like sockets, memory maps and channels, avoiding intermediary copies needing to be made.
Just guessing here.
Notes
[1] Invalid because double flipping will cause the limit to be set to 0, not capacity, which will in turn cause BufferOverflowException for most internal methods due to limit - position being <= 0 or position >= limit. For example:
511 final int nextPutIndex() { // package-private
512 if (position >= limit)
513 throw new BufferOverflowException();
514 return position++;
515 }
516
517 final int nextPutIndex(int nb) { // package-private
518 if (limit - position < nb)
519 throw new BufferOverflowException();
520 int p = position;
521 position += nb;
522 return p;
523 }
524
525 /**
526 * Checks the given index against the limit, throwing an {#link
527 * IndexOutOfBoundsException} if it is not smaller than the limit
528 * or is smaller than zero.
529 */
530 final int checkIndex(int i) { // package-private
531 if ((i < 0) || (i >= limit))
532 throw new IndexOutOfBoundsException();
533 return i;
534 }
Related
I just asked a question about why my thread shut down wasn't working. It ended up being due to readLine() blocking my thread before the shutdown flag could be recognised. This was easy to fix by checking ready() before calling readLine().
However, I'm now using a DataInputStream to do the following in series:
int x = reader.readInt();
int y = reader.readInt();
byte[] z = new byte[y]
reader.readFully(z);
I know I could implement my own buffering which would check the running file flag while loading up the buffer. But I know this would be tedious. Instead, I could let the data be buffered within the InputStream class, and wait until I have my n bytes read, before executing a non-blocking read - as I know how much I need to read.
4 bytes for the first integer
4 bytes for the second integer y
and y bytes for the z byte array.
Instead of using ready() to check if there is a line in the buffer, is there some equivalent ready(int bytesNeeded)?
The available() method returns the amount of bytes in the InputStreams internal buffer.
So, one can do something like:
while (reader.available() < 4) checkIfShutdown();
reader.readInt();
You can use InputStream.available() to get an estimate of the amount of bytes that can be read. Quoting the Javadoc:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking, which may be 0, or 0 when end of stream is detected. The read might be on the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In other words, if available() returns n, you know you can safely call read(n) without blocking. Note that, as the Javadoc states, the value returned is an estimate. For example, InflaterInputStream.available() will always return 1 if EOF isn't reached. Check the documentation of the InputStream subclass you will be using to ensure it meets your needs.
You are going to need to implement your own equivalent of BufferedInputStream. Either as a sole owner of an InputStream and a thread (possibly borrowed from a pool) to block in. Alternatively, implement with NIO.
To start with, I understand the concept of buffering as a wrapper around, for instance, FileInuptStream to act as a temporary container for contents read(lets take read scenario) from an underlying stream, in this case - FileInputStream.
Say, there are 100 bytes to read from a stream(file as a source).
Without buffering, code(read method of BufferedInputStream) has to make 100 reads(one byte at a time).
With buffering, depending on buffer size, code makes <= 100 reads.
Lets assume buffer size to be 50.
So, the code reads the buffer(as a source) only twice to read the contents of a file.
Now, as the FileInuptStream is the ultimate source(though wrapped by BufferedInputStream) of data(file which contains 100 bytes), wouldn't it has to read 100 times to read 100 bytes? Though, the code calls read method of BufferedInputStream but, the call is passed to read method of FileInuptStream which needs to make 100 read calls. This is the point which I'm unable to comprehend.
IOW, though wrapped by a BufferedInputStream, the underlying streams(such as FileInputStream) still have to read one byte at a time. So, where is the benefit(not for the code which requires only two read calls to buffer but, to the application's performance) of buffering?
Thanks.
EDIT:
I'm making this as a follow-up 'edit' rather than 'comment' as I think its contextually better suits here and as a TL;DR for readers of chat between #Kayaman and me.
The read method of BufferedInputStream says(excerpt):
As an additional convenience, it
attempts to read as many bytes as possible by repeatedly invoking the
read method of the underlying stream. This iterated read continues
until one of the following conditions becomes true:
The specified number of bytes have been read,
The read method of the underlying stream returns -1, indicating end-of-file, or
The available method of the underlying stream returns zero, indicating that further input requests would block.
I digged into the code and found method call trace as under:
BufferedInputStream -> read(byte b[]) As a I want to see buffering in action.
BufferedInputStream -> read(byte b[], int off, int len)
BufferedInputStream -> read1(byte[] b, int off, int len) - private
FileInputStream -
read(byte b[], int off, int len)
FileInputStream -> readBytes(byte b[], int off, int len) - private and native. Method description from source code -
Reads a subarray as a sequence of bytes.
Call to read1(#4, above mentioned) in BufferedInputStream is in an infinite for loop. It returns on conditions mentioned in above excerpt of read method description.
As I had mentioned in OP(#6), the call does seem to handle by an underlying stream which matches API method description and method call trace.
The question still remains, if native API call - readBytes of FileInputStream reads one byte at a time and create an array of those bytes to return?
The underlying streams(such as FileInputStream) still have to read
one byte at a time
Luckily no, that would be hugely inefficient. It allows the BufferedInputStream to make read(byte[8192] buffer) calls to the FileInputStream which will return a chunk of data.
If you then want to read a single byte (or not), it will efficiently be returned from BufferedInputStream's internal buffer instead of having to go down to the file level. So the BI is there to reduce the times we do actual reads from the filesystem, and when those are done, they're done in an efficient fashion even if the end user wanted to read just a few bytes.
It's quite clear from the code that BufferedInputStream.read() does not delegate directly to UnderlyingStream.read(), as that would bypass all the buffering.
public synchronized int read() throws IOException {
if (pos >= count) {
fill();
if (pos >= count)
return -1;
}
return getBufIfOpen()[pos++] & 0xff;
}
I've read the java docs and a number of related questions but am unsure if the following is guaranteed to work:
I have a DataInputStream on a dedicated thread that continually reads small amounts of data, of known byte-size, from a very active connection. I'd like to alert the user when the stream becomes inactive (i.e. network goes down) so I've implemented the following:
...
streamState = waitOnStreamForState(stream, 4);
int i = stream.readInt();
...
private static int
waitOnStreamForState(DataInputStream stream, int nBytes) throws IOException {
return waitOnStream(stream, nBytes, STREAM_ACTIVITY_THRESHOLD, STREAM_POLL_INTERVAL)
? STREAM_STATE_ACTIVE
: STREAM_STATE_INACTIVE;
private static boolean
waitOnStream(DataInputStream stream, int nBytes, long timeout, long pollInterval) throws IOException {
int timeWaitingForAvailable = 0;
while( stream.available() < nBytes ){
if( timeWaitingForAvailable >= timeout && timeout > 0 ){
return false;
}
try{
Thread.sleep(pollInterval);
}catch( InterruptedException e ){
Thread.currentThread().interrupt();
return (stream.available() >= nBytes);
}
timeWaitingForAvailable += pollInterval;
}
return true;
}
The docs for available() explain:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream. The next caller might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Does this mean it's possible the next read (inside readInt()) might only, for instance, read 2 bytes, and the subsequent read to finish retrieving the Integer could block? I realize readInt() is a method of the stream 'called next' but I presume it has to loop on a read call until it gets 4 bytes and the docs don't mention subsequent calls. In the above example is it possible that the readInt() call could still block even if waitOnStreamForState(stream, 4) returns STREAM_STATE_ACTIVE?
(and yes, I realize my timeout mechanism is not exact)
Does this mean it's possible the next read (inside readInt()) might only, for instance, read 2 bytes, and the subsequent read to finish retrieving the Integer could block?
That's what it says. However at least the next read() won't block.
I realize readInt() is a method of the stream 'called next' but I presume it has to loop on a read call until it gets 4 bytes and the docs don't mention subsequent calls. In the above example is it possible that the readInt() call could still block even if waitOnStreamForState(stream, 4) returns STREAM_STATE_ACTIVE?
That's what it says.
For example, consider SSL. You can tell that there is data available, but you can't tell how much without actually decrpyting it, so a JSSE implementation is free to:
always return 0 from available() (this is what it used to do)
always return 1 if the underlying socket's input stream has available() > 0, otherwise zero
return the underlying socket input stream's available() value and rely on this wording to get it out of trouble if the actual plaintext data is less. (However the correct value might still be zero, if the cipher data consisted entirely of handshake messages or alerts.)
However you don't need any of this. All you need is a read timeout, set via Socket.setSoTimeout(), and a catch for SocketTimeoutException. There are few if any correct uses of available(): fewer and fewer as time goes on, it seems to me. You should certainly not waste time calling sleep().
I have a ByteBuffer that contains a large file (100 MB):
java.nio.ByteBuffer byteBuffer = ByteBuffer.wrap(multipartFile.getBytes());
writeChannel.write(byteBuffer);
writeChannel.closeFinally();
I can only legally write 1 MB to the writeChannel at a time.
How do I slice up the contents of the ByteBuffer and write only a 1 MB slice at a time into the writeChannel?
You can use ByteBuffer#slice() to get a duplicate view of your base ByteBuffer instance, then bump the position along to expose a sliding window of content. Alternately, you can just do the same to your base buffer if you don't need to expose it to any other consumers.
You can change the starting position of your view of the content via the single-argument Buffer#position(int) method, and change the end position of your view via Buffer#limit(int). So long as you're mindful not to push the view beyond the limit of the underlying buffer, you can do the following:
final ByteBuffer view = base.slice();
for (int start = base.position(), end = base.limit(), stride = 1000000;
start != end;
start = view.limit())
consume(view.position(start)
.limit(start + Math.min(end - start, stride)));
I didn't test it, but it looks correct. It's possible to rewrite to avoid the initial setting of the position, which isn't strictly necessary here, but it incurs either some repetition or more awkward special case treatment of the first time through.
I left it this way to preserve the basic for loop structure.
As far as I know, write() on the writeChannel (which for the name I supposed is of type SocketChannel) will "attempt to write up to r bytes to the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked". (according to this)
The description of ByteBuffer.remaining() says that this method will "Return the number of elements between the current position and the limit."
So my guessing is not that you can't write the entire ByteBuffer, but that your code should call flip() on the ByteBuffer object so it become:
java.nio.ByteBuffer byteBuffer = ByteBuffer.wrap(multipartFile.getBytes());
byteBuffer.flip();
writeChannel.write(byteBuffer);
writeChannel.closeFinally();
As its said in Buffer.flip(): "After a sequence of channel-read or put operations, invoke this method to prepare for a sequence of channel-write or relative get operations"
I am reading in arbitrary size file in blocks of 1021 bytes, with a block size of <= 1021 bytes for the final block of the file. At the moment, I am doing this using a BufferedInputStream which is wrapped around a FileInputStream and code that looks (roughly) like the following (where reader is the BufferedInputStream and this is operating in a loop):
int availableData = reader.available();
int datalen = (availableData >= 1021)
? 1021
: availableData;
reader.read(bufferArray, 0, datalen);
However, from reading the API docs, I note that available() only gives an "estimate" of the available size, before the call would 'block'. Printing out the value of availableData each iteration seems to give the expected values - starting with the file size and slowly getting less until it is <= 1021. Given that this is a local file, am I wrong to expect this to be a correct value - is there a situation where available() would give an incorrect answer?
EDIT: Sorry, additional information. The BufferedInputStream is wrapped around a FileInputStream. From the source code for a FIS, I think I'm safe to rely on available() as a measure of how much data is left in the case of a local file. Am I right?
The question is pointless. Those four lines of code are entirely equivalent to this:
reader.read(buffer, 0, 1021);
without the timing-window problem you have introduced between the available() call and the read. Note that this code is still incorrect as you are ignoring the return value, which can be -1 at EOS, or else anything between 1 and 1021 inclusive.
It doesn't give the estimated size, it gives the remaining bytes that can be read. It's not an estimate with BufferedInputStream.
Returns the number of bytes that can
be read from this input stream without
blocking.
You should pass available() directly into the read() call if you want to avoid blocking, but remember to return if the return value is 0 or -1. available() might throw an exception on buffer types that don't support the operation.