Suppose a Java application writes to a file using BufferedWriter API (and does not call flush after every write). I guess that if the application exits with System.exit the buffer is not flushed and so the file might be corrupted.
Suppose also that the application component, which decides to exit, is not aware of the component, which writes to the file.
What is the easiest and correct way to solve the "flush problem" ?
You may use the Runtime.addShutdownHook method, which can be used to add a jvm shutdown hook. This is basically a unstarted Thread, which executes on shutdown of the Java Virtual Machine.
So if you have a handle of the file available for that thread, then you can try to close the stream and flush the output.
Note: Although it seems feasible to use this, but I believe there will be implementation challenges to it because of the fact that whether your file handle is not stale when your shutdown hook is called. So the better approach should be to close your streams gracefully using finally blocks in the code where file operations are done.
You can add a shutdown hook but you need to have a reference to each of these BufferedWriter or other Flushable or Closable objects. You won't gain anything from it. You should perform close() and flush() directly in the code that is manipulating the object.
Think of the Information Expert GRASP pattern, the code manipulating the BufferedWriter is the place that has the information about when an operation is finished and should be flushed, so that's where that logic should go.
If some application component is calling System.exit when things aren't done, I would consider that an abnormal exit, should not return 0 and therefore shouldn't guarantee that streams are flushed.
Related
I am writing to and reading from a Linux file in java, which in reality is a communication port to a hardware device. To do this I use RandomAccessFile (I'll explain why later) and it works well in most cases. But sometimes a byte is lost and then my routine blocks indefinitely since there is no timeout on the read method.
To give some more details on the file: it is a USB receipt printer that creates a file called /dev/usb/lp0 and though I can use a cups driver to print, I still need the low level communication through this file to query the status of the printer.
The reason I use RandomAccessFile is that I can have the same object for both reading and writing.
I tried to make a version with InputStream and OutputStream instead (since that would allow me to use the available() method to implement my timeout). But when I first open the InputStream and then the OutputStream I get an exception when opening the OutputStream since the file is occupied.
I tried writing with the OutputStream and then closing it before opening the InputStream to read, but then I lose some or all of the reply before it has opened the InputStream.
I tried switching to channels instead (Files.newByteChannel()). This also allows me to have just one object, and the documentation says it only reads the bytes available and returns the count (which also allows me to implement a timeout). But it blocks in the read method anyway when there is nothing to read, despite what the documentation says.
I also tried a number of ways to implement timeouts on the RandomAccessFile using threads.
The first approach was to start a separate thread at the same time as starting to read, and if the timeout elapsed in the thread I closed the file from the thread, hoping that this would unlock the read() operation with an exception, but it didn't (it stayed blocked).
I also tried to do the read in a separate thread and brutally kill it with the deprecated Thread.stop() once the time had elapsed. This worked one time, but it was not possible to reopen the file again after that.
The only solution I have made work is to have a separate thread that continuously calls read, and whenever it gets a byte it puts it in a LinkedBlockingQueue, which I can read from with a timeout. This approach works, but the drawback is that I can never close the file (again for the same reasons explained above, I can't unblock a blocked read). And my application requires that I sometimes close this connection to the hardware.
Anyone who can think of a way to read from a file with timeout that would work in my case (that allows me to have both a read and a write access open to the file at the same time)?
I am using Java8 by the way.
I've noticed that sometimes, a thread calling a write method on an ObjectOutputStream object, like writeUTF(), to send a value via a socket will flush the data automatically, so that there is no need for me to call flush() on the object. The thread at the other end of the communication line receives the data just fine. This has worked even when the sender thread writes on the stream object many hundreds of times under a loop.
Other times, my threads are deadlocked because the sender threads are not sending the data. This problem is fixed when I manually call a flush() method immediately after invoking, for example, writeUTF().
I doubt that this is random. I think there must be some specific circumstance under which threads writing to a stream flush the data automatically. I would like to know what those circumstances are, if any.
This is implementation dependent and may change depending on platform, version, and build of Java. Your best bet is to call flush() whenever you might need to. If there is no data to be flushed, a call to flush() is extremely fast, so this will not significantly slow down your program.
Process A writes in a file XYZ, when executed. There are processes B and C, which when executed, reads the file XYZ. So, while process A is up, B and C should wait for A to complete. To provide synchronization can I use java.nio package? or I should use something like FileLock or sockets? Can we mention the time to wait for the second process to wait?
Edited: The file is created during the first write process. In such case, can I make it shared resource?
Using java.nio package's file lock could be a better solution, I hope. But, I think java.nio is not full-fledged till JDK 1.6.
http://www.withoutbook.com/DifferenceBetweenSubjects.php?subId1=7&subId2=43&d=Java%206%20vs%20Java%207
FileLock:
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileLock.html
One way could be the usage of a flag. Just a boolean stillWriting which is readable from outside.
As soon process A did its Job, this flag is set to false and your processes B/C can start their work with this file.
Assuming A wants to start again editing this file, it'll set this flag back to true and block the other two processes.
Using locks would be a good idea. You can use Conditions from JavaAPi.
Refer to [http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html#awaitNanos(long)][1]
When A is working it should signal the thread to await and then on completion it can signal so that other thread waiting to start can proceed. Also this is very appropriate when we use shared resource.
I need to have a buffered char stream, into which I write in one thread and from which I read in another thread. Right now I'm using PipedReader and PipedWriter for it, but those classes cause a performance problem: PipedReader does a wait(1000) when its internal buffer is empty, which causes my application to lag visibly.
Would there be some library which does the same thing as PipedReader/PipedWriter, but with better performance? Or will I have to implement my own wheels?
The problem was that when something is written to the PipedWriter, it does not automatically notify the PipedReader that there is some data to read. When one tries to read PipedReader and the buffer is empty, the PipedReader will loop and wait using a wait(1000) call until the buffer has some data.
The solution is to call PipedWriter.flush() always after writing something to the pipe. All that the flush does is call notifyAll() on the reader. The fix to the code in question looks like this.
(To me the PipedReader/PipedWriter implementation looks very much like a case of premature optimization - why not to notifyAll on every write? Also the readers wait in an active loop, waking up every second, instead of waking only when there is something to read. The code also contains some todo comments, that the reader/writer thread detection which it does is not sophisticated enough.)
This same problem appears to be also in PipedOutputStream. In my current project calling flush() manually is not possible (can't modify Commons IO's IOUtils.copy()), so I fixed it by creating low-latency wrappers for the pipe classes. They work much better than the original classes. :-)
It should be fairly easy to wrap a char stream API around BlockingQueue.
I must say, however, it seems quite perverse that PipedReader would use polling to wait for data. Is this documented somewhere, or did you discover it for yourself somehow?
#Esko Luontola, I've been reading through your code in the sbt package to try to understand what you are doing. It seems like you want to start up a Process and pass input to it, and have the result of the action be teed to different places. Is this at all correct?
I would try modifying the main loop in ReaderToWriterCopier so that instead of doing a read() - a blocking operation that apparently when a PipedReader is involved causes polling - you explicitly wait for the Writer to flush. The documentation is clear that flush causes any Readers to be notified.
I'm not sure how to run your code so I can't get deeper into it. Hope this helps.
I implemented something a little similar, and asked a question whether anyone else had any better thought out and tested code.
I can understand why network apps would use multiplexing (to not create too many threads), and why programs would use async calls for pipelining (more efficient). But I don't understand the efficiency purpose of AsynchronousFileChannel.
Any ideas?
It's a channel that you can use to read files asynchronously, i.e. the I/O operations are done on a separate thread, so that the thread you're calling it from can do other things while the I/O operations are happening.
For example: The read() methods of the class return a Future object to get the result of reading data from the file. So, what you can do is call read(), which will return immediately with a Future object. In the background, another thread will read the actual data from the file. Your own thread can continue doing things, and when it needs the read data, you call get() on the Future object. That will then return the data (if the background thread hasn't completed reading the data, it will make your thread block until the data is ready). The advantage of this is that your thread doesn't have to wait the whole length of the read operation; it can do some other things until it really needs the data.
See the documentation.
Note that AsynchronousFileChannel will be a new class in Java SE 7, which is not released yet.
I've just come across another, somewhat unexpected reason for using AsynchronousFileChannel. When performing random record-oriented writes across large files (exceeding physical memory so caching isn't helping everything) on NTFS, I find that AsynchronousFileChannel performs over twice as many operations, in single-threaded mode, versus a normal FileChannel.
My best guess is that because the asynchronous io boils down to overlapped IO in Windows 7, the NTFS file system driver is able to update its own internal structures faster when it doesn't have to create a sync point after every call.
I micro-benchmarked against RandomAccessFile to see how it would perform (results are very close to FileChannel, and still half of the performance of AsynchronousFileChannel.
Not sure what happens with multi-threaded writes. This is on Java 7, on an SSD (the SSD is an order of magnitude faster than magnetic, and another order of magnitude faster on smaller files that fit in memory).
Will be interesting to see if the same ratios hold on Linux.
The main reason I can think of to use asynchronous IO is to better utilize the processor. Imagine you have some application which does some sort of processing on a file. And also let's assume you can process the data contained in the file in chunks. If you don't make use of asynchronous IO then your application will probably behave something like this:
Read a block of data. No processor utilization at this point as you're blocked waiting for the data to be read.
process the data you just read. At this point your application will start consuming CPU cycles as it processed the data.
If more data to read, goto #1.
The processor utilization will go up and then to zero and then up and then to zero, ... . Ideally you want to not be idle if you want your application to be efficient and process the data as fast as possible. A better approach would be:
Issue async read
When read completes issue next async read and then process data
The first step is the bootstrapping. You have no data yet so you have to issue a read. From then on, when you get notified a read has completed you issue another async read and then process the data. The benefit here is that by the time you finish processing the chunk of data the next read has probably finished, so you always have data available to process and thus you're more efficiently using the processor. If your processing finishes before the read has finished you might need to issue multiple asynchronous reads so that you have more data to process.
Nick
Here's something no one has mentioned:
A plain FileChannel implements InterruptibleChannel so it, as well as anything that uses it such as the OutputStream returned by Files.newOutputStream(), has the unfortunate[1][2] behaviour that performing any blocking operation on it (e.g. read() and write()) in a thread in interrupted state will cause the Channel itself to close with java.nio.channels.ClosedByInterruptException.
If this is a problem, using AsynchronousFileChannel instead is a possible alternative.
[1] http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6608965
[2] https://bugs.openjdk.java.net/browse/JDK-4469683