How do I flush a 'RandomAccessFile' (java)? - java

I'm using RandomAccessFile in java:
file = new RandomAccessFile(filename, "rw");
...
file.writeBytes(...);
How can I ensure that this data is flushed to the Operating System? There is no file.flush() method. (Note that I don't actually expect it to be physically written, I'm content with it being flushed to the operating system, so that the data will survive a tomcat crash but not necessarily an unexpected server power loss).
I'm using tomcat6 on Linux.

The only classes that provide a .flush() method are those that actually maintain their own buffers. As java.io.RandomAccessFile does not itself maintain a buffer, it does not need to be flushed.

Have a look carefully at RandomAccessFile constructor javadoc:
The "rws" and "rwd" modes work much like the force(boolean) method of the FileChannel class, passing arguments of true and false, respectively, except that they always apply to every I/O operation and are therefore often more efficient. If the file resides on a local storage device then when an invocation of a method of this class returns it is guaranteed that all changes made to the file by that invocation will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash. If the file does not reside on a local device then no such guarantee is made.

You can use getFD().sync() method.

here's what i do in my app:
rf.close();
rf = new RandomAccessFile("mydata", "rw");
this is give 3-4times gain in performance
compared to getFd().sync() and 5-7 times compared to "rws' mode
deoes exactly what the original question proposed: passes
on unsaved data to the OS and out of JVM. Doesn't physically
write to disk, and therefore introduces no annoying delays

I reached here with the very same curiosity.
And I really can't figure what need to flush on OS and not necessarily need to flush to Disk part means.
In my opinion,
The best thing matches to the concept of a managed flushing is getFD().sync(), as #AVD said,
try(RandomAccessFile raw = new RandomAccessFile(file, "rw")) {
raw.write...
raw.write...
raw.getFD().sync();
raw.wirte...
}
which looks like, by its documentation, it works very much like what FileChannel#force(boolean) does with true.
Now "rws" and "rwd" are look like they work as if specifying StandardOpenOption#SYNC and StandardOpenOption#DSYNC respectively while a FileChannel is open.
try(RandomAccessFile raw = new RandomAccessFile(file, "rws")) {
raw.write...
raw.write...
raw.wirte...
// don't worry be happy, woo~ hoo~ hoo~
}

I learned that you can't..
Some related links here: http://www.cs.usfca.edu/~parrt/course/601/lectures/io.html
and here: http://tutorials.jenkov.com/java-io/bufferedwriter.html

Related

Why does the PrintWriter class (and other writers) require a call to flush after writing?

I have noticed that some I/O Classes in Java (and a lot others, like the BufferedWriter and FileWriter), require a call to flush() after writing. (With the exception of AutoFlush, I'll get to that later).
For example, this call to println() will not work. However, if I invoke writer#flush() after, the line will print.
PrintWriter writer = new PrintWriter(System.out);
writer.println("test");
Also, does autoflushing impact performance in any way (especially in larger/consistent writes), or is it just a convenience, and is it recommended to use it?
Why does the PrintWriter class (and other writers) require a call to flush after writing?
To the extent that flushing is wanted1, it will be needed if the "stack" of output classes beneath the print writer is doing some output buffering. If an output stream is buffered, then some event needs to trigger pushing (flushing) the buffered output to the the external file, pipe, socket or whatever. The things that will trigger flushing are:
the buffer filling up
something calling close() on the stream, or
something calling flush() on the stream.
In the case of a PrintWriter, the underlying stream can also be flushed by the classes auto-flushing mechanism.
The reason for buffering output (in general) is efficiency. Performing the low-level output operation that writes data to the (external) file, pipe, whatever involves a system call. There are significant overheads in doing this, so you want to avoid doing lots of "little" writes.
Output buffering is the standard way to solve this problem. Data to be written is collected in the buffer until the buffer fills up. The net result us lots of "little" writes can be aggregated into a "big" write. The performance improvement can be significant.
Also, does autoflushing impact performance in any way (especially in larger/consistent writes), or is it just a convenience, and is it recommended to use it?
It is really a convenience to avoid having to explicitly flush. It doesn't improve performance. Indeed, if you don't need the data to be flushed1, then unnecessary auto-flushing will reduce performance.
1 - You would want the data to be flushed if someone or something wants to see the data you are writing as soon as possible.
They don't require flushing, only if you want to guarantee that output has been displayed so far, which is exactly what flushing is. If you are fine writing to a file and just want to make sure it gets there before the program terminates, then no need to flush.
When data is written to an output stream, the underlying
an operating system does not guarantee that the data will make it
to the file system immediately. In many operating systems, the
data may be cached in memory, with a write occurring only
after a temporary cache is filled or after some amount of time
has passed.
If the data is cached in memory and the application terminates
unexpectedly, the data would be lost, because it was never
written to the file system. To address this, all output stream
classes provide a flush() method, which requests that all accumulated data be written immediately to disk.
The flush() method helps reduce the amount of data lost if the
application terminates unexpectedly. It is not without cost,
though. Each time it is used, it may cause a noticeable delay in
the application, especially for large files. Unless the data that
you are writing is extremely critical, the flush() method should
be used only intermittently. For example, it should not
necessarily be called after every write.
You also do not need to call the flush() method when you have
finished writing data, since the close() method will
automatically do this.
Read from the book here -> OCP Oracle Certified Professional Java SE 11
Hope this is clear to you!

FileChannel read behaviour [duplicate]

For example I have a file whose content is:
abcdefg
then i use the following code to read 'defg'.
ByteBuffer bb = ByteBuffer.allocate(4);
int read = channel.read(bb, 3);
assert(read == 4);
Because there's adequate data in the file so can I suppose so? Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
Can I assume that the method returns a number less than limit of the given buffer only when there aren't enough bytes in the file?
The Javadoc says:
a read might not fill the buffer
and gives some examples, and
returns the number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream.
This is NOT sufficient to allow you to make that assumption.
In practice, you are likely to always get a full buffer when reading from a file, modulo the end of file scenario. And that makes sense from an OS implementation perspective, given the overheads of making a system call.
But, I can also imagine situations where returning a half empty buffer might make sense. For example, when reading from a locally-mounted remote file system over a slow network link, there is some advantage in returning a partially filled buffer so that the application can start processing the data. Some future OS may implement the read system call to do that in this scenario. If assume that you will always get a full buffer, you may get a surprise when your application is run on the (hypothetical) new platform.
Another issue is that there are some kinds of stream where you will definitely get partially filled buffers. Socket streams, pipes and console streams are obvious examples. If you code your application assuming file stream behavior, you could get a nasty surprise when someone runs it against another kind of stream ... and fails.
No, in general you cannot assume that the number of bytes read will be equal to the number of bytes requested, even if there are bytes left to be read in the file.
If you are reading from a local file, chances are that the number of bytes requested will actually be read, but this is by no means guaranteed (and won't likely be the case if you're reading a file over the network).
See the documentation for the ReadableByteChannel.read(ByteBuffer) method (which applies for FileChannel.read(ByteBuffer) as well). Assuming that the channel is in blocking mode, the only guarantee is that at least one byte will be read.

What is the insertion order if I have two memory-mapped buffers mapped to the same file?

My question is whether the OS will respect the insertion order (i.e. last written, last to disk) or the order will be unpredictable. For example:
byte[] s1 = "Testing1!".getBytes();
byte[] s2 = "Testing2!".getBytes();
byte[] s3 = "Testing3!".getBytes();
RandomAccessFile raf = new RandomAccessFile("test.txt", "rw");
FileChannel fc = raf.getChannel();
MappedByteBuffer mbb1 = fc.map(MapMode.READ_WRITE, 0, 1024 * 1024);
mbb1.put(s1);
MappedByteBuffer mbb2 = fc.map(MapMode.READ_WRITE, mbb1.position(), 1024 * 1024);
mbb2.put(s2);
MappedByteBuffer mbb3 = fc.map(MapMode.READ_WRITE, mbb1.position() + mbb2.position(), 1024 * 1024);
mbb3.put(s3);
mbb1.put(s1); // overwrite mbb2
mbb1.put(s1); // overwrite mbb3
mbb1.force(); // go to file
mbb3.force(); // can this ever overwrite mbb1 in the file?
mbb2.force(); // can this ever overwrite mbb1 in the file?
Is it always last written, last in or am I missing something here?
I haven't tested any of this, so I don't know.
But, frankly, there's no guarantee on any of this ordering.
You have the mbb.force() method, but that's not the only way to write to the device, rather it just ensures that it has been written.
The VM can flush the page back to the device whenever it feels like it, using whatever schedule it deems fit, which is, naturally, extremely platform dependent (the behavior on Linux may be different than the behavior on Windows, it may even vary from Linux to Linux or Windows to Windows).
Seems to be that you should be coordinating internally to ensure that you only have one read/write buffer mapped to a specific area of a file, and manage the conflicts and overlap that way rather than relying on the operating system.
Edit: "changes done by multiple memory-mapped buffers are guaranteed to be consistent"
Simply, this means that the underlying VM, once a physical page is mapped in to a process, that mapping is shared across all of the assorted mappings performed. The thread issues is simply due to CPU memory cacheing and other issues.
So, this guarantees that all of the mappings will see the same data within an overlapping buffer. But it does not address when the buffers will actually get written to the device. Those points are still germane.
Overall it sounds like you won't have an issue if you handle any multithreading aspects correctly, and be aware that what you see in your underlying buffer may "change beneath your feet".

Files.newInputStream creates slow InputStream

On my Windows 7 Files.newInputStream returns sun.nio.ch.ChannelInputStream. When I tested its performance vs FileInputStream I was surprised to know that FileInputStream is faster.
This test
InputStream in = new FileInputStream("test");
long t0 = System.currentTimeMillis();
byte[] a = new byte[16 * 1024];
for (int n; (n = in.read(a)) != -1;) {
}
System.out.println(System.currentTimeMillis() - t0);
reads 100mb file in 125 ms. If I replace the first line with
InputStream in = Files.newInputStream(Paths.get("test"));
I get 320ms.
If Files.newInputStream is slower what advantages it has over FileInputStream?
If you tested new FileInputStream second, you are probably just seeing the effect of cache priming by the operating system. It isn't plausible that Java is causing any significant difference to an I/O-bound process. Try it the other way around, and on a much larger dataset.
I don't want to be the buzzkill, but the javadoc doesn't state any advantages, nor does any documentation I could find
Opens a file, returning an input stream to read from the file. The
stream will not be buffered, and is not required to support the mark
or reset methods. The stream will be safe for access by multiple
concurrent threads. Reading commences at the beginning of the file.
Whether the returned stream is asynchronously closeable and/or
interruptible is highly file system provider specific and therefore
not specified.
I think the method is just a utility method not necessarily meant to replace or improve on FileInputStream. Note that the concurrency point might explain some slow down.
Your FileInputStream and FileOutputstreams might introduce long GC pauses
Every time you create either a FileInputStream or a FileOutputStream, you are creating an object. Even if you close it correctly and promptly, it will be put into a special category that only gets cleaned up when the garbage collector does a full GC. Sadly, due to backwards compatibility constraints, this is not something that can be fixed in the JDK anytime soon as there could be some code out there where somebody has extended FileInputStream / FileOutputStream and is relying on those finalize() methods to ensure the call to close().
The solution (at least if you are using Java 7 or newer) is not too hard
— just switch to Files.newInputStream(...) and Files.newOutputStream(...)
https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful
The document said
"The stream will not be buffered"
It's because Files.newInputStream(Paths) support non-blocking IO.
You can try in debug mode, you can open non blocking inputstream and in the same time modify the file, but if you use FileInputStream, you cannot do such things.
FileInputStream will require "write lock" of file, so it can buffer the content of file, increase the speed of reading.
But ChannelInputStream cannot. It must guaranteed that it is reading the "current" content of file.
Above is my experience, I didn't check every point in Java doc.

FileInputStream and FileOutputStream to the same file: Is a read() guaranteed to see all write()s that "happened before"?

I am using a file as a cache for big data. One thread writes to it sequentially, another thread reads it sequentially.
Can I be sure that all data that has been written (by write()) in one thread can be read() from another thread, assuming a proper "happens-before" relationship in terms of the Java memory model? Is this behavior documented?
In my JDK, FileOutputStream does not override flush(), and OutputStream.flush() is empty. That's why I'm wondering...
The streams in question are owned exclusively by a class that I have full control of. Each stream is guaranteed to be accesses by one thread only. My tests show that it works as expected, but I'm still wondering if this is guaranteed and documented.
See also this related discussion.
Assuming you are using a posix file system, then yes.
FileInputStream and FileOutputStream on *nix use the read and write system calls internally. The documentation for write says that reads will see the results of past writes,
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was
modified by that write shall return the data specified by the write()
for that position until such byte positions are again modified.
I'm pretty sure ntfs on windows will have the same read() write() guarantees.
You can't talk about "happens-before" relationship in terms of the Java memory model between your FileInputStream and FileOutputStream objects since they don't share any memory or thread. VM is free to reorder them just honoring your synchronization requirements. When you have proper synchronization between reads and writes without application level buffering, you are safe.
However FileInputStream and FileOutputStream share a file, which leaves things up to the OS which in main stream ones you can expect to read after write in order.
If FileOutputStream does not override flush(), then I think you can be sure all data written by write() can be read by read(), unless your OS does something weird with the data (like starting a new thread that waits for the hard drive to spin at the right speed instead of blocking, etc) so that it is not written immediately.
No, you need to flush() the Streams (at least for Buffered(Input|Output)Streams), otherwise you could have data in a buffer.
Maybe you need a concurrent data structure?

Categories

Resources