what are the concern regarding simultaneous read and write to a file? - java

consider the following scenario:
Process 1 (Writer) continuously appends a line to a file ( sharedFile.txt )
Process 2 (Reader) continuously reads a line from sharedFile.txt
my questions are:
In java is it possible that :
Reader process somehow crashes Writer process (i.e. breaks the process of Writer)?
Reader some how knows when to stop reading the file purely based on the file stats (Reader doesn't know if others are writing to the file)?
to demonsterate
Process one (Writer):
...
while(!done){
String nextLine;//process the line
writeLine(nextLine);
...
}
...
Process Two (Reader):
...
while(hasNextLine()){
String nextLine= readLine();
...
}
...
NOTE:
Writer Process has priority. so nothing must interfere with it.

Since you are talking about processes, not threads, the answer depends on how the underlying OS manages open file handles:
On every OS I'm familiar with, Reader will never crash a writer process, as Reader's file handle only allows reading. On Linux, system calls a Reader can potentially invoke on the underlying OS are open(2) with O_RDONLY flag, lseek(2) and read(2) -- are known not to interfere with the syscalls that the Writer is invoking, such as write(2).
Reader most likely won't know when to stop reading on most OS. More precisely, on some read attempt it will receive zero as the number of read bytes and will treat this as an EOF (end of file). At this very moment, there can be Writer preparing to append some data to a file, but Reader have no way of knowing it.
If you need a way for two processes to communicate via file, you can do it using some extra files that pass meta-information between Readers and Writers, such as whether there are Writer currently running. Introducing some structure into a file can be useful too (for example, every Writer appends a byte to a file indicating that the write process is happening).
For very fast non-blocking I/O you may want consider memory mapped files via Java's MappedByteBuffer.

The code will not crash. However, the reader will terminate when the end is reached, even if the writer may still be writing. You will have to synchronize somehow!

Concern:
Your reader thread can read a stale value even when you think another writer thread has updated the variable value
Even if you write to a file if synchronization is not there you will see a different value while reading

Java File IO and plain files were not designed for simultaneous writes and reads. Either your reader will overtake your writer, or your reader will never finish.
JB Nizet provided the answer in his comment. You use a BlockingQueue to hold the writer data while you're reading it. Either the queue will empty, or the reader will never finish. You have the means through the BlockingQueue methods to detect either situation.

Related

How to set a timeout when reading from a Java RandomAccessFile

I am writing to and reading from a Linux file in java, which in reality is a communication port to a hardware device. To do this I use RandomAccessFile (I'll explain why later) and it works well in most cases. But sometimes a byte is lost and then my routine blocks indefinitely since there is no timeout on the read method.
To give some more details on the file: it is a USB receipt printer that creates a file called /dev/usb/lp0 and though I can use a cups driver to print, I still need the low level communication through this file to query the status of the printer.
The reason I use RandomAccessFile is that I can have the same object for both reading and writing.
I tried to make a version with InputStream and OutputStream instead (since that would allow me to use the available() method to implement my timeout). But when I first open the InputStream and then the OutputStream I get an exception when opening the OutputStream since the file is occupied.
I tried writing with the OutputStream and then closing it before opening the InputStream to read, but then I lose some or all of the reply before it has opened the InputStream.
I tried switching to channels instead (Files.newByteChannel()). This also allows me to have just one object, and the documentation says it only reads the bytes available and returns the count (which also allows me to implement a timeout). But it blocks in the read method anyway when there is nothing to read, despite what the documentation says.
I also tried a number of ways to implement timeouts on the RandomAccessFile using threads.
The first approach was to start a separate thread at the same time as starting to read, and if the timeout elapsed in the thread I closed the file from the thread, hoping that this would unlock the read() operation with an exception, but it didn't (it stayed blocked).
I also tried to do the read in a separate thread and brutally kill it with the deprecated Thread.stop() once the time had elapsed. This worked one time, but it was not possible to reopen the file again after that.
The only solution I have made work is to have a separate thread that continuously calls read, and whenever it gets a byte it puts it in a LinkedBlockingQueue, which I can read from with a timeout. This approach works, but the drawback is that I can never close the file (again for the same reasons explained above, I can't unblock a blocked read). And my application requires that I sometimes close this connection to the hardware.
Anyone who can think of a way to read from a file with timeout that would work in my case (that allows me to have both a read and a write access open to the file at the same time)?
I am using Java8 by the way.

Cannot interrupt named pipe channel in Java on Windows

I have named pipe .\pipe\pipe1 on Windows I want to read from with Java.
From the documentation, FileChannel should be interruptible. Read should throw a ClosedByInterruptException if the reading thread is interrupted from another therad. This probably works for regular files, but I now have a named pipe.
My situation is like this:
RandomAccessFile raf = new RandomAccessFile("\\\\.\\pipe\\pipe1", "r");
FileChannel fileChannel = raf.getChannel();
// later in another thread "readThread"
fileChannel.read(buffer);
// outside the thread reading
readThread.interrupt();
The problem is that the call to interrupt will block and read will remain blocked until something to the named pipe is written so that read will stop blocking.
I need to be able to abort/cancel the read when nothing is written to the pipe while it is not closed yet.
Why does interrupting with the NIO classes not work? Is there a solution to this problem that does not involve busy-waiting or sleep with ready? What would be the best solution for this problem or is there a workaround?
I have not figured out a real solution to the question how to cancel the read. But I needed to adjust anyway and will now explain why. If you have anything to add to the original problem of the blocked read, you can post an additional answer.
A named piped could be treated like a file and opened separately for reading and writing with classic Java IO streams. However, a named piped is often used like a socket and as such, it requires a single file open. So I one could use Java IO streams like this:
RandomAccessFile raf = new RandomAccessFile("\\\\.\\pipe\\pipe1", "rws");
FileChannel fileChannel = raf.getChannel();
InputStream fis = Channels.newInputStream(fileChannel);
OutputStream fos = Channels.newOutputStream(fileChannel);
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
PrintWriter pw = new PrintWriter(fos, true);
Now a problem one will later notice is this: If you write while reading, things will get locket up. It seems concurrent reading/writing is not possible, which is outline here.
To solve this, I used a ReentrantLock set to fair to switch from reading/writing. The reading thread is checking readyness and can be triggered with interrupt if one finishes writing and expects a response after this. If ready, the reading buffer is drained. If it is not ready, an interval can be scheduled or simulated for sporadically expected messages. This last part is not optimal, but actually works very well for my use case.
With this, one can build a solution where all threads can be orchestrated to terminate correctly with no blocking and minimal overhead.

Timeout on opening and reading from a named pipe in Java

In my current design I have a named pipe which can be sequentially written to by an unspecified number of writer processes. There is only one reader implemented in Scala, but for the sake of simplicity we can assume it's implemented in Java. The operating system is Linux >= 2.6.
The reader needs to:
re-open the pipe after each writer sends its input
read all the input from pipe until EOF which indicates that the writer closed its end of the named pipe
The main difficulty here is that the reader needs to be able to cancel both open and read calls after a given timeout. Reaching the timeout indicates that all writers have done their job and the reader can safely exit.
In C, I would:
first call open(file_name, O_NONBLOCK) and poll for the pipe to be open for writing
poll for reading in non-blocking mode or change file descriptor to blocking mode and use select()
What is the most straightforward way to complete this in Java? I've looked at the classic IO and NIO but there more I try, the more complex the design becomes and it still doesn't do exactly what I want.

Reusing one BufferedWriter instance for one file along application lifetime

I have a multi-threading application where some of my threads should read data from queue and write them to a file. The problem here is that I am confusing should I create new BufferedWriter instance every time when one of my threads reads value from queue and writes it to same file or I can have just one BufferedWriter instance and flush() every time. One problem in second solution is that I should detect when I should close the BufferedWriter without using Java 7 perfect solution for closing resources in try-catch block.
Does the second solution solves some performance issues?
What are best practices on this?
I'd recommend using one BufferedWriter for writing to the file that is shared by all threads. For the sake of performance, the BufferedWriter should be kept open until the application decides that there is no more output. (Opening and closing files is relatively expensive.)
You also need to have the application threads use some kind of locking (e.g. synchronize on the BufferedWriter) to ensure that they don't try to write at the same time. The BufferedWriter class is not thread-safe.
The try/finally or try-with-resources approach to managing file resources is important in cases where you are opening lots of files. If you are only dealing with one file, and it needs to be open for the entire duration of the application, then resource management is not so critical. (But you do need to make sure that you either close or flush before the application exits.)
But I think BufferedWriter is thread-safe, because underlying implementation of write() methods using synchronized blocks
Well in that sense, yes it is. However, that assumes that when your thread writes data, it does it in a single write(...) call.
Note also that BufferedWriter is not specified to be thread-safe, even if it is thread-safe in the current implementation.
A BufferedWriter should ever lead to one file/Writer/OutputStream; if you have many targets you will need many buffers. If you want to write to the same file from multiple threads you will need to synchronize on the earliest bit; you can synchronized on the BufferredWriter if you don't have more high-level constructs that the character stream. If you synchronize on the BufferedWriter, you will not need to call flush() after the end of each access.

Should multiple threads read from the same DataInputStream?

I'd like my program to get a file, and then create 4 files based on its byte content.
Working with only the main thread, I just create one DataInputStream and do my thing sequentially.
Now, I'm interested in making my program concurrent. Maybe I can have four threads - one for each file to be created.
I don't want to read the file's bytes into memory all at once, so my threads will need to query the DataInputStream constantly to stream the bytes using read().
What is not clear to me is, should my 4 threads call read() on the same DataInputStream, or should each one have their own separate stream to read from?
I don't think this is a good idea. See http://download.java.net/jdk7/archive/b123/docs/api/java/io/DataInputStream.html
DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class.
Assuming you want all of the data in each of your four new files, each thread should create its own DataInputStream.
If the threads share a single DataInputStream, at best each thread will get some random quarter of the data. At worst, you'll get a crash or data corruption due to multithreaded access to code that is not thread safe.
If you want to read data from 1 file into 4 separate ones you will not share DataInputStream. You can however wrap that stream and add functionality that would make it thread safe.
For example you may want to read in a chunk of data from your DataInputStream and cache that small chunk. When all 4 threads have read the chunk you can dispose of it and continue reading. You would never have to load the complete file into memory. You would only have to load a small amount.
If you look at the doc of DataInputStream. It is a FilterInputStream, which means the read operation is delegated to other inputStream. Suppose you use here is a FileInputStream, In most platform, concurrent read will be supported.
So in your case, you should initialize four different FileInputStream, result in four DataInputStream, used in four thread separately. The read operation will not be interfered.
Short answer is no.
Longer answer: have a single thread read the DataInputStream, and put the data into one of four Queues, one per output file. Decide which Queue based upon the byte content.
Have four threads, each one reading from a Queue, that write to the output files.

Categories

Resources