I have a character special device file on a Linux system (eg /dev/foobardcma6) that I wish to constantly write data to. What is the preferred way to do this in Java?
I tried using AsynchronousFileChannel and was able to write some bytes to it, but eventually it blocks/hangs when writing. I don't know if this is the right approach or not, however.
I could use a FileChannel. The write method returns how many bytes were actually written. I assume if it returns less bytes than were requested to write, it means the write buffer is full, and you should wait before writing again. However, I don't see any mechanism to be notified that the file is ready for writing.
Update:
I tried using a FileChannel, and it also blocks after a certain number of bytes. The suspicious thing is both the FileChannel and AsychronousFileChannel implementations block after writing exactly the same number of bytes. In both cases the last call to write never returns.
I have a test utility written in C++ that can successfully write data to the device without issue, so it's not a hardware problem. I assume I'm doing something wrong with the FileChannels.
Related
For example I have a file whose content is:
abcdefg
then i use the following code to read 'defg'.
ByteBuffer bb = ByteBuffer.allocate(4);
int read = channel.read(bb, 3);
assert(read == 4);
Because there's adequate data in the file so can I suppose so? Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
The Javadoc says:
a read might not fill the buffer
and gives some examples, and
returns the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream.
This is NOT sufficient to allow you to make that assumption.
In practice, you are likely to always get a full buffer when reading from a file, modulo the end of file scenario. And that makes sense from an OS implementation perspective, given the overheads of making a system call.
But, I can also imagine situations where returning a half empty buffer might make sense. For example, when reading from a locally-mounted remote file system over a slow network link, there is some advantage in returning a partially filled buffer so that the application can start processing the data. Some future OS may implement the read system call to do that in this scenario. If assume that you will always get a full buffer, you may get a surprise when your application is run on the (hypothetical) new platform.
Another issue is that there are some kinds of stream where you will definitely get partially filled buffers. Socket streams, pipes and console streams are obvious examples. If you code your application assuming file stream behavior, you could get a nasty surprise when someone runs it against another kind of stream ... and fails.
No, in general you cannot assume that the number of bytes read will be equal to the number of bytes requested, even if there are bytes left to be read in the file.
If you are reading from a local file, chances are that the number of bytes requested will actually be read, but this is by no means guaranteed (and won't likely be the case if you're reading a file over the network).
See the documentation for the ReadableByteChannel.read(ByteBuffer) method (which applies for FileChannel.read(ByteBuffer) as well). Assuming that the channel is in blocking mode, the only guarantee is that at least one byte will be read.
With reference to the stackoverflow question it is said that the InputStream can be read multiple times with mark() and reset() provided by the InputStream or by using PushbackInputStream.
In all these cases the content of the stream is stored in byte array (ie; the original content of the file is stored in main memory) and reused multiple times.
What happens when the size of the file exceeds the memory size? I think this may pave way for OutOfMemoryException.
Is there any better way to read the stream content multiple times without storing the stream content locally (ie; in main memory)?
Please help me knowing this. Thanks in advance.
It depends on the source of the stream.
If it's a local file, you can likely re-open and re-read the stream as many times as you want.
If it's dynamically generated by a process, a remote service, etc., you might not be free to re-generate it. In that case, you need to store it, either in memory or in some more persistent (and slow) storage like a file system or storage service.
Maybe an analogy would help. Suppose your friend is speaking to you at length. You listen carefully without interruption, but when they are done, you realize you didn't understand something they said near the beginning, and want to review that portion.
At this point, there are a few possibilities.
Perhaps your friend was actually reading aloud from a book. You can simply re-read the book.
Or, perhaps you had to foresight to record their monologue. You can replay the recording.
However, since neither you nor your friend has perfect and unlimited recall, simply repeating verbatim what was said ten minutes ago from memory alone is not an option.
An InputStream is like your friend speaking. Neither of you has a good enough memory to remember exactly, word-for-word, what is said. In the same way, neither a process that is generating the data stream nor your program has enough RAM to store, byte-for-byte, the stream. To scale, your program has to rely on its "short-term memory" (RAM), working with just a small portion of the whole stream at any given time, and "taking notes" (writing to a persistent store) as it encounters important points.
If the source of stream is a local file, then it's like your friend reading a book. Either of you can re-read that content easily enough.
If you copy the stream to some persistent storage, that's like recording your friend's speech. You can replay it as often as you like.
Consider a scenario where browser is uploading a large file, but the server is busy, and not able to read that stream for some time. Where is that data stored during that delay?
Because the receiver can't always respond immediately to input, TCP and many other protocols allocate a small buffer to store some data from a sender. But, they also have a way to tell the sender to wait, they are sending data too fast—flow control. Going back to the analogy, it's like telling your friend to pause a moment while you catch up with your note-taking.
As the browser uploads the file, at first, the buffer will be filled. But if the server can't keep up, the browser will be instructed to pause its upload until there is more room in the buffer. (This generally happens at the OS and TCP level; the client and server applications don't manage this directly.) The upload speed depends on how fast the browser can read the file from disk, how fast the network link is, and how fast the server can process the uploaded data. Even a fast network and client will be limited by the weak link in this chain.
For direct mapped buffer, they always stay outside jvm heap. Channels, on the other hand, seem to also exists in the IO operation. I'm just wondering if a the corresponding channel for the direct mapped buffer also stays outside jvm heap?
Also, The other question comes from the necessity of existence of Channel in terms of memory operation efficiency. I understand that
A channel represents an open connection to an entity such as a hardware device
Yet, in the situation of writing to file through the direct mapped buffer, is the contents being written twice? The content is first written to the buffer then to the channel. Would this be low efficiency versus "directly" writing to the IO device?
I'm just wondering if a the corresponding channel for the direct mapped buffer also stays outside jvm heap?
The question doesn't make sense. A Channel isn't a piece of memory, it is an interface to an operating system FD.
in the situation of writing to file through the direct mapped buffer, is the contents being written twice? The content is first written to the buffer then to the channel. Would this be low efficiency versus "directly" writing to the IO device?
No. The MappedByteBuffer is independent of the channel it came from. For example, it isn't closed when the channel is closed.
Are you perhaps looking for direct byte buffers? They do exist, and you write to them via channels, but I/O via them happens once, not twice.
I think I figure out the answer.
Direct mapped buffer first write to a buffer, which is non IO operation. The IO operation for real write that updates the file happens afterwards. The reason for two write is that since we have DMA, we don't want all IO operation happen by times. All IO operation in one time is more efficient.
Getting the starting address through the JNI is not correct address for accessing that piece of memory. Instead, the address for data is a field is hidden in java.nio.buffer. One way to get this field is though sun.misc.Unsafe in making the field public.
For example I have a file whose content is:
abcdefg
then i use the following code to read 'defg'.
ByteBuffer bb = ByteBuffer.allocate(4);
int read = channel.read(bb, 3);
assert(read == 4);
Because there's adequate data in the file so can I suppose so? Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
The Javadoc says:
a read might not fill the buffer
and gives some examples, and
returns the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream.
This is NOT sufficient to allow you to make that assumption.
In practice, you are likely to always get a full buffer when reading from a file, modulo the end of file scenario. And that makes sense from an OS implementation perspective, given the overheads of making a system call.
But, I can also imagine situations where returning a half empty buffer might make sense. For example, when reading from a locally-mounted remote file system over a slow network link, there is some advantage in returning a partially filled buffer so that the application can start processing the data. Some future OS may implement the read system call to do that in this scenario. If assume that you will always get a full buffer, you may get a surprise when your application is run on the (hypothetical) new platform.
Another issue is that there are some kinds of stream where you will definitely get partially filled buffers. Socket streams, pipes and console streams are obvious examples. If you code your application assuming file stream behavior, you could get a nasty surprise when someone runs it against another kind of stream ... and fails.
No, in general you cannot assume that the number of bytes read will be equal to the number of bytes requested, even if there are bytes left to be read in the file.
If you are reading from a local file, chances are that the number of bytes requested will actually be read, but this is by no means guaranteed (and won't likely be the case if you're reading a file over the network).
See the documentation for the ReadableByteChannel.read(ByteBuffer) method (which applies for FileChannel.read(ByteBuffer) as well). Assuming that the channel is in blocking mode, the only guarantee is that at least one byte will be read.
Reading the javadoc on FileDesciptor's .sync() method, it is apparent that sync() is primarily concerned with committing any modified buffers back to the underlying storage. I.e., making sure that anything that your program has output will actually make it to the disk (or socket or what-have-you, but my question pertains mainly to disks).
But what about the other direction, what about INPUT? Suppose my program has some parts of a java.io.RandomAccessFile buffered in memory, and I want to READ those parts of the file, but perhaps some other process has modified those parts of the file since the last time my program read those blocks?
This is akin to marking a variable as 'volatile' in a C program; something else may have changed the 'real version' of something you merely have a convenient copy of.
I.e., how can you be certain that what your java program reads is at least reasonably up-to-date?
(Clearly the definition of 'up to date' matters. Purely as an example, suppose that the other process, the one that writes to the file, does so on the order of maybe once per second, and suppose that the reading process reads maybe once per minute. In a situation like this, performance isn't a big deal, it's just a matter of making sure that what the reader reads is consistent with what the write writes, to within say, a second.)
Before re-reading your file, it is usually a good idea to check the last modified timestamp of the file with File.lastModified(). If this timestamp is not newer than the last time you read the file, you don't need to bother with more disk I/O to re-read the blocks you are interested in. One thing to keep in mind though, is that the last modifed timestamp may not always be updated immediately when the contents are updated if you are using a network filesystem. If you are dealing with a local process updating the file and another local process running your code reading the file, you most likely won't run into this issue.
One method I've had success with in the past was to have a separate thread poll the file for the last modified timestamp on certain intervals, say 5 seconds. If the file changed, re-process the file and send an event to registered listeners. In my case, 5 seconds was more than soon enough to get updates.
At the moment where the file is read into the internal buffer, the contents are up-to-date to the contents on the disk.
If you want to be sure to have the latest contents on your next access, you also have to go to the disk again, skipping all internal buffers and caches. If you really want to be sure, that all such layers are skipped, you'll have to reopen the file from scratch and seek to the according position you want to access.
Of course, your performance will go down the tubes if you access the disk on every possible access of the data. Don't think of 3-5 fold or so but orders of magnitudes.
If another program you control is the only one writing to the file, then it's probably best to have 2 threads in the same Java process coordinate. The easiest solution is to create a java.util.concurrrent.atomic.AtomicBoolean. The writer thread calls set(true) on the AtomicBoolean and the reader calls getAndSet(false). If getAndSet() returns true, then you know the reader needs to re-read the data. If it's an issue, you could synchronize on some object to prevent the writer from writing while the reader is reading.
You said "process" in the question, so maybe you are concerned about any other process on the system changing the data. In this case, I think you best bet is to just reopen and reread the data. The performance impact of this should be negligible if you really are only reading once per minute.