Using Java's java.nio.channels.FileLock I am trying to synchronize file reading and writing on a Windows filesystem. I have a test program that runs in a loop:
Lock the file X.LOCK
Test that X.JSON exists (just a consistency check)
Write the file X.TMP
Rename X.TMP to X.JSON using java.nio.files.Files.move() deletes the old X.JSON and renames X.TMP to X.JSON in a atomic action.
Test that X.JSON exists (this always returns true)
Release the lock on X.LOCK
I run this in a tight loop in multiple instances of the test program. It locks the symbolic file "X.LOCK" and not the actual file that is being written and renamed. I believe that is necessary to preserve the lock through the rename operation.
Here is what I find: In about 2% of the cases, process 1 will write/rename/release the lock, and process 2 which was waiting on that lock will get that lock, start executing, but find that X.JSON does not exist. The "exists" check returns false!
If I introduce a delay (200ms) after the rename, and before the unlock, then the whole thing runs 100% reliably. I can try smaller delays, but I am loath to add any delay since that is never the right answer to making a reliable program.
It appears that when one process atomically renames a file, it takes some time for the other process to see that. But the unlock signal goes faster! So the lock signal tells the other program to move forward, and that other program can't see the file it is supposed to be working on!
Question: is there any way I can force the unlock signal to be sent AFTER the file system has settled and guaranteed to be consistent with operations that were put in there before the unlock was called?
Any hints on where I can look for information on this kind of timing/sequencing on a Windows file system using Java? I have not tried this test program on any other platform yet, but I certainly will check Linux soon.
UPDATE
I am suspicious of interference from virus scanning. It got a test to a reproducible state, and it was failing about 1% of the time, this time reporting "AccessDeniedException". I think the virus scan might be kicking in, scanning the file between being created and being renamed, and when it does this, it runs at a higher privilege, and causes this error when trying to rename it. Anyone else run into this problem?
The solution appears to be that on a system where virus scan is running, depending upon the specific brand of virus scanner, it is possible that the call to move can be interfered with. I was calling:
java.nio.files.Files.move(src, dest, StandardCopyOption.REPLACE_EXISTING,
StandardCopyOption.ATOMIC_MOVE );
This command will effectively delete the dest if it exists, and rename the src file to the dest, and it will do it atomically. It is documented that if it can not do it atomically, it will throw an exception. I was getting AccessDeniedException which is not mentioned in the documentation specifically but apparently happens.
What appears to be happening is that -- and this all depended on a specific timing that was happening about 1% of the time -- was that the operation of the virus scan either on the src file or the dest file caused the atomic move to fail.
I tried on each of three different systems configured differently. The windows computer with the Microsoft Windows Defender never caused the AccessDeniedException while another with Trend Micro virus scan was failing regularly. That is not a thorough survey of virus scan options; they were the only options I had available for test. The machine with the Trend Micro also has an encrypted hard disk, and that might be a factor to make the timing such as to trip this problem.
I even went so far as to implement a "retry" where if the move threw an exception, the code would wait 10ms and try again. Even with this, the retry failed about 0.1% of the time. Maybe I could have waited longer, but that would in any case be a problem making the code slower.
What worked was to add a step to delete the file being replaced before doing the move. My guess is that the virus scan is either stopped by the delete, or else it continues to scan on the src or dest file without bothering the move command. The steps are these:
Lock the file X.LOCK
Test that X.JSON exists (just a consistency check)
Write the file X.TMP
(NEW) Delete the old X.JSON
Rename X.TMP to X.JSON using java.nio.files.Files.move() simply renames X.TMP to X.JSON in an atomic action.
Test that X.JSON exists (this always returns true)
Release the lock on X.LOCK
Is this now 100% reliable? I can't say for sure, since all this is timing dependent. It is possible that this just changed the timing in a way that allows it to run.
Related
One ThreadPool is downloading files from the FTP server and another thread pool is reading files from it.
Both ThreadPool are running concurrently. So exactly what happens, I'll explain you by taking one example.
Let's assume, I've one csv file with 100 records.
While threadPool-1 is downloading and writing it in a file in pending folder, and at the same time threadpool-2 reads the content from that file, but assume in 1 sec only 10 records can be written in a file in /pending folder and threadpool - 2 reads only 10 record.
ThreadPool - 2 doesn't know about that 90 records are currently in process of downloading. Now, threadPool - 2 will not read 90 records because it doesn't know that whole file is downloaded or not. After reading it'll move that file in another folder. So, my 90 records will be proceed further.
My question is, how to wait until whole file is downloaded and then only threadPool 2 can read contents from the file.
One more thing is that both threadPools are use scheduleFixedRate method and run at every 10 sec.
Please guide me on this.
I'm a fan of Mark Rotteveel's #6 suggestion (in comments above):
use a temporary name when downloading,
rename when download is complete.
That looks like:
FTP download threads write all files with some added extension – perhaps .pending – but name it whatever you want.
When a file is downloaded – say some.pdf – the FTP download thread writes the file to some.pdf.pending
When an FTP download thread completes a file, the last step is a file rename operation – this is the mechanism for ensuring only "done" files are ready to be processed. So it downloads the file to some.pdf.pending, then at the end, renames it to some.pdf.
Reader threads look for files, ignoring anything matching *.pending
I've built systems using this approach and they worked out well. In contrast, I've also worked with more complicated systems that tried to coordinate across threads and.. those often did not work so well.
Over time, any software system will have bugs. Edsger Dijkstra captured this so well:
"If debugging is the process of removing software bugs, then programming must be the process of putting them in."
However difficult it is to reason about program correctness now – while the program is still in design phase,
and has not yet been built – it will be harder to reason about correctness when things are broken in production (which will happen, because bugs).
That is, when things are broken and you're under time pressure to find the root cause (and fix it!), even the best of us would be at a disadvantage
with a complicated (vs. simple) system.
The approach of using temporary names is simple to reason about, which should minimize code complexity and thus make it easier to implement.
In turn, maintenance and bug fixes should be easier, too.
Keep it simple – let the filesystem help you out.
I have a Java application that creates multiple threads. There is 1 producer thread which reads from a 10gb file, parses that information, creates objects from it and puts them into multiple blocking queues (5 queues).
The rest of the 5 consumer threads read from a blockingqueue (each consumer thread has its own blockingqueue). The consumer threads then each write to an individual file, so 5 files in total get created. It takes around 30min to create all files.
The problem:
The threads are writing to an external mount directory in a linux box. We've experience problems where other linux mounts have gone down and applications crash so I want to prevent that in this application.
What I would like to do is keep checking if the mount (directory) exists before writing to it. Im assuming if the directory goes down it will throw a FileNotFoundException. If that is the case I want it to keep checking if the directory is there for about 10-20min before completely crashing. Because I dont want to have to read the 10gb file again I want the consumer threads to be able to pick up from where they last left off.
What Im not sure would be best practice is:
Is it best to check if the directory exists in the main class before creating the threads? Or check in each consumer thread?
If I keep checking if the directory exists in each consumer thread it seems like repeatable code. I can check in the main class but it takes 30min to create these files. What if in those 30min the mount goes down then if Im only checking in the main class whether the directory exists the application will crash. Or if Im already writing to a directory is it impossible for an external directory to go down? Does it get locked?
thank you
We have something similar in our application, but in our case we are running a web app and if our mounted file system goes down we just throw an exception, but we want to do something more elegant, like you do...
I would recommend using a combination of the following patterns: State, CircuitBreaker, which I believe CircuitBreaker is a more specific version of the State pattern, and Observer/Observable.
These would work in the following way...
Create something that represents your file system. Maybe a class called MountedFileSystem. Make all your write calls to this particular class.
This class will catch all FileNotFoundException and one occurs, the CircutBreaker gets triggered. This change will be like the State pattern. One state is when things are working 'fine', the other state is when things aren't working 'fine', meaning that the mount has gone away.
Then, in the background, I would have a task that starts on a thread and checks the actual underlying file system to see if it is back. When the file system is back, change the state in the MountedFileSystem, and fire an Event (Observer/Observable) to try writing the files again to disk.
And as yuan quigfei stated, I am fairly certain you're going to have to rewrite those files. I just don't see being able to restart writing to them, but perhaps someone else has an idea.
write a method to detect folder exist or not.
call this method before actual writing.
create 5 thread based on 2. Once detect file is not existed, you seems have no choice but rewrite. Of course, you don't need re-read if all your content are in memory already(Big memory).
EDIT : Well, I'm back a bunch of months later, the lock mechanism that I was trying to code doesn't work, because createNewFile isn't reliable on the NFS. Check the answer below.
Here is my situation : I have only 1 application which may access the files, so I don't have any constraint about what other applications may do, but the application is running concurrently on several servers in the production environment for redundancy and performance purposes (a couple of machines are hosting each a couple of JVM with our apps).
Basically, what I need is to put some kind of flag in a folder to tell the other instances to leave this folder alone as another instance is already dealing with it.
Many search results are telling to use FileLock to achieve this, but I checked the Javadoc, and from my understanding it will not help much, since it's using the hosting OS's locking possibilities. So I doubt that it will help much since there are different hosting machines.
This question covers a similar subject : Java file locking on a network , and the accepted answer is recommending to implement your own kind of cooperative locking process (using the File.createNewFile() as asked by the OP).
The Javadoc of File.createNewFile() says that the process is atomically creating the file if it doesn't already exist. Does that work reliably in a network file system ?
I mean, how is it possible with the potential network lag to do both existence check and creation simultaneously ? :
The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
No, createNewFile doesn't work properly on a network file system.
Even if the system call is atomic, it's only atomic regarding the OS, and not over the network.
Over the time, I got a couple of collisions, like once every 2-3 months (approx. once every 600k files).
The thing that happens is my program is running in 6 separates instances over 2 separate servers, so let's call them A1,A2,A3 and B1,B2,B3.
When A1, A2, and A3 try to create the same file, the OS can properly ensure that only one file is created, since it is working with itself.
When A1 and B1 try to create the same file at the same exact moment, there is some form of network cache and/or network delays happening, and they both get a true return from File.createNewFile().
My code then proceeds by renaming the parent folder to stop the other instances of the program from unnecessarily trying to process the folder and that's where it fails :
On A1, the folder renaming operation is successful, but the lock file can't be removed, so A1 just lets it like that and keeps on processing new incoming folders.
On B1, the folder renaming operation (File.renameTo(), can't do much to fix it) gets stuck in a infinite loop because the folder was already renamed (also causing a huge I/O traffic according to my sysadmin), and B1 is unable to process any new file until the program is rebooted.
The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
That can be implemented easily via the open() system call or its equivalents in any operating system I have ever used.
I mean, how is it possible with the potential network lag to do both
existence check and creation simultaneously ?
There is a difference between simultaneously and atomically. Java doc is not saying anything about this function being a set of two simultaneous actions but two actions designed to work in atomic way. If this method is built to do two operations atomically than means file will never be created without checking file existence first and if file gets created by current call then it means there were no files present and if file doesn't get created that means there was already a file by that name.
I don't see a reason to doubt function being atomic or working reliably despite call being on network or local disk. Local call is equally unreliable - so many things can go wrong in an IO.
What you have to doubt is when trying to use empty file created by this function as a Lock as explained D-Mac's answer for this question and that is what explicitly mentioned in Java Doc for this function too.
You are looking for a directory lock and empty files working as a directory lock ( to signal other processes and threads to not touch it ) has worked quite well for me provided due care is taken to write logic to check for file existence,lock file clean up and orphaned locks.
In my Java app, on Linux, I need to periodically read some text files that change often.
(these text files are updated by a separate app).
Do I need to be concerned about the rare case when attempting to read the file at the exact moment it is being updated? If so, how can I guarantee that my reads always return without failing? Does the OS handle this for me, or could I potentially read 1/2 a file?
thanks.
The OS can help you achieve consistent reads, but it requires that both apps are written with this in mind.
In a nutshell, you open the file in your java app with exclusive read/write permission - this ensures that no one else, including your other app is modifying the file while you are reading it. The FileLock class can help you ensure you have exclusive access to a file.
Your other app will then periodically try to write to the file. If it does this at the same time you are reading the file, then access will be denied, and the other app should retry. This is the critical part, since if the app doesn't expect the file to be unavailable and treats this as a fatal error condition, the write will fail, and app doesn't save the data and may fail/exit etc.
If the other app must always be able to write to the file, then you have to avoid using exclusive reads. Instead, you have to try to detect an inconsistent read, such as by checking the last modified timestamp when you start reading, and when you finish reading. If the timestamps are the same, then you are good to go and have a consistent read.
Yes, you need to worry about this.
No, your reads shouldn't "fail" AFAIK, unless the file is momentarily being locked, in which you can catch the exception and try again after a brief pause. You might certainly, though, get more or less data than you expected.
(If you post code we might be able to comment more accurately on what'll happen.)
One of our clients is using some Novel security software that sometimes locks some .class files that our software creates. This causes some nasty problems for them when this occurs and I am trying to research a workaround that we could add to our error handling to address this problem when it comes up. I am wondering are there any calls in the java api that can be used to detect if a file is locked, and if it is, unlock it.
Before attempting to write to a file, you can check if the file is writable by your java application using File.canWrite(). However, you still might run into an issue if the 3rd party app locks the file in between your File.canWrite() check and when your application actually attempts to write. For this reason, I would code your application to simply go ahead and try to write to the file and catch whatever Exception gets thrown when the file is locked. I don't believe there is a native Java way to unlock a file that has been locked by another application. You could exec a shell command as a privileged user to force things along but that seems inelegant.
File.canWrite() has the race condition that Asaph mentioned. You could try FileChannel.lock() and get an exclusive lock on the file. As long as the .class is on your local disk, this should work fine (file locking can be problematic on networked disks).
Alternatively, depending on how the .class name is discovered, you could create a new name for your .class each time; then if the anti-virus software locks your initial class, you can still create the new one.