Java Files.createFile Docker volume atomic? - java

I'm using:
Files.createFile("/tmp/marker.txt")
from within two different Docker that are mounting the same Docker volume. The Javadoc is mentioning that this operation is atomic and therefore only one of the attempt will succeed:
Creates a new and empty file, failing if the file already exists. The
check for the existence of the file and the creation of the new file
if it does not exist are a single operation that is atomic with
respect to all other filesystem activities that might affect the
directory.
Is this operation still behaving the same in this scenario?

Related

Java File watcher with multiple JVM watching single directory for incoming files

I have a situation where there are two java applications are watching a directory for incoming file. Say there is a directory DIR that is being watched by two JVM processes for any files with the extension .SGL.
The problem we face here is that, sometimes both nodes are being notified about the new files and both nodes are trying to process the same file.
Usually we handle these situations using a database that try to insert into a table with unique file name column and only one will succeed and continue processing.
But for this situation, we don't have database.
What is the best way to handle these kind of problems? Can we depend on the file renaming solutions? Is file renaming is atomic operation?
For such a situation Spring Integration suggests FileSystemPersistentAcceptOnceFileListFilter: https://docs.spring.io/spring-integration/reference/html/files.html#file-reading
Stores "seen" files in a MetadataStore to survive application restarts.
The default key is 'prefix' plus the absolute file name; value is the timestamp of the file.
Files are deemed as already 'seen' if they exist in the store and have the
same modified time as the current file.
When you have shared persistent MetadataStore for all your application instances only one of them will process the file. All others will just filter it.
Every watcher (even two in the same JVM) should always be notified of the new File being added.
If you want to divide the work, you can either
use one JVM to run twice as many threads and divide the work via a queue.
use an operation which will only succeed for one JVM. e.g.
file rename
create a lock file
lock the file itself
Is file renaming is atomic operation?
Yes, only one process can successful rename a file, even if both attempt to rename to same name.

Is file creation process safe among different processes at os level (ubuntu)?

I have two java application which works on some file exist check mechanism , where one application wait till file deletion occurs and create a file on deletion of file to manage concurrency. If the process are not process safe my application fails.
The pseudocode:
if file exists:
do something with it
It's not concurrent safe as nothing ensures the file does not get deleted between the first and the second line.
The safest way would be to use a FileLock. If you are planning to react to file creation/deletion events on Linux, I'd recommend to use some inotify based solution.

Is java.io.File.createNewFile() atomic in a network file system?

EDIT : Well, I'm back a bunch of months later, the lock mechanism that I was trying to code doesn't work, because createNewFile isn't reliable on the NFS. Check the answer below.
Here is my situation : I have only 1 application which may access the files, so I don't have any constraint about what other applications may do, but the application is running concurrently on several servers in the production environment for redundancy and performance purposes (a couple of machines are hosting each a couple of JVM with our apps).
Basically, what I need is to put some kind of flag in a folder to tell the other instances to leave this folder alone as another instance is already dealing with it.
Many search results are telling to use FileLock to achieve this, but I checked the Javadoc, and from my understanding it will not help much, since it's using the hosting OS's locking possibilities. So I doubt that it will help much since there are different hosting machines.
This question covers a similar subject : Java file locking on a network , and the accepted answer is recommending to implement your own kind of cooperative locking process (using the File.createNewFile() as asked by the OP).
The Javadoc of File.createNewFile() says that the process is atomically creating the file if it doesn't already exist. Does that work reliably in a network file system ?
I mean, how is it possible with the potential network lag to do both existence check and creation simultaneously ? :
The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
No, createNewFile doesn't work properly on a network file system.
Even if the system call is atomic, it's only atomic regarding the OS, and not over the network.
Over the time, I got a couple of collisions, like once every 2-3 months (approx. once every 600k files).
The thing that happens is my program is running in 6 separates instances over 2 separate servers, so let's call them A1,A2,A3 and B1,B2,B3.
When A1, A2, and A3 try to create the same file, the OS can properly ensure that only one file is created, since it is working with itself.
When A1 and B1 try to create the same file at the same exact moment, there is some form of network cache and/or network delays happening, and they both get a true return from File.createNewFile().
My code then proceeds by renaming the parent folder to stop the other instances of the program from unnecessarily trying to process the folder and that's where it fails :
On A1, the folder renaming operation is successful, but the lock file can't be removed, so A1 just lets it like that and keeps on processing new incoming folders.
On B1, the folder renaming operation (File.renameTo(), can't do much to fix it) gets stuck in a infinite loop because the folder was already renamed (also causing a huge I/O traffic according to my sysadmin), and B1 is unable to process any new file until the program is rebooted.
The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
That can be implemented easily via the open() system call or its equivalents in any operating system I have ever used.
I mean, how is it possible with the potential network lag to do both
existence check and creation simultaneously ?
There is a difference between simultaneously and atomically. Java doc is not saying anything about this function being a set of two simultaneous actions but two actions designed to work in atomic way. If this method is built to do two operations atomically than means file will never be created without checking file existence first and if file gets created by current call then it means there were no files present and if file doesn't get created that means there was already a file by that name.
I don't see a reason to doubt function being atomic or working reliably despite call being on network or local disk. Local call is equally unreliable - so many things can go wrong in an IO.
What you have to doubt is when trying to use empty file created by this function as a Lock as explained D-Mac's answer for this question and that is what explicitly mentioned in Java Doc for this function too.
You are looking for a directory lock and empty files working as a directory lock ( to signal other processes and threads to not touch it ) has worked quite well for me provided due care is taken to write logic to check for file existence,lock file clean up and orphaned locks.

Java monitor folder for files

I need to monitor a certain folder for new files, which I need to process.
I have the following requirements:
The filenames of the files are sequence numbers. I need to process each file in order. (Lowest number first, there's no guarantee that each sequence number exists. eg: 1,2,5,8,9
If files already exist in the folder during startup, I need to process them directly
I need a guarantee that I only process each file once
I need to avoid reading incomplete files (which are still being copied)
The service should ofcourse be reliable...
What is the most common way to accomplish this?
I'm using Java SE7 and Spring 4.
I already had a look at the WatchService of Java 7 but it seems to have problems with processing already existing files during startup, and avoid processing incomplete files.
Assembling comments into an answer.
Easiest way to parse the files in the correct order is to load the entire directory file listing into an array / list and then sort the list using an appropriate comparator. E.g. Load files with File.list() or File.listFiles().
This is not the most efficient methodology, but for less than 10,000 files should be adequate unless you need faster startup time performance (I can imagine a small lag before processing begins as all of the files are listed).
To avoid reading incomplete files you should acquire an exclusive FileLock (via a FileChannel which you can get from the FileOutputStream or FileInputStream, however you may not be able to get an exclusive lock from the FileInputStream) on the file. Assuming the OS being used supports file locking (which modern OSes do) and the application writing the file is well behaved and holding a lock (hopefully it is) then as soon as you are able to acquire the lock you know the file is complete.
If for some reason you cannot rely on file locking then you either need to have the writing program first write to a temporary file (perhaps with a different extension) and then atomically move / rename the file (atomic for most OSes if on the same file system / partition), or monitor the file for a period of time to see if further bytes are being written (not the most robust methodology).

Java - working with files atomically

I am writing a Java application which should (among other things) generate a sequence of integers, starting with a given number (such as 900, 901, 902, 903, ... - the 900 is given as a parameter).
The current sequence value should persist when the application gets shut down and then started again.
When multiple instances of the application are running at the same time, they should share the same sequence (e.g. the union of the sequences generated by all instances should be the same as the sequence generated by a single instance, when running alone).
The administrator should be able to shut down the application and reset the current sequence value manually.
If the application crashes, the file should always stay accessible for other instances so that they can continue to work.
It was decided that the application would use a plain text file which would contain just the current number. When the application starts, it checks out if the file already exists and if not, creates it and writes the initial number into it. Everytime the application is about to generate a new number, it should read the current value inside the file, use it as the current sequence value, and then increment the number in the file.
I would like to now, how to do these two things atomically (with regards to other running instances of the same application):
check out if a file exists and if not, create it and write a number into it
read the current content of a file and then change it
Suggestions on how to achieve the listed goals in other ways are appreciated as well.
Using a database sequence would be a simple and solid solution but you've decided it will be a file. Then you'll need to manage the distributed synchronization yourself. There are systems offering that, like Terracotta or Hazelcast. I would definitely use one of them instead of implementing a new one based on locking a file. Why not a database?
I would create a lock file when a client writes the file and delete that lock file immediatly when the write process is done.
When the lock file is present other clients will not read or write the db file and wait until the lock file is deleted - simultanious reads are allowed.
You questions:
Shutdownhook
Is solved by using the lock file mechanisem
Every client could create an ID file beside the db file and when that file is deleted by the admin the client shuts down.
Depends: if the shutdownhook is respected this should not be a problem but if the client is killed immediatly you dont have any chance to clean up.
Problems:
If to many clients try to write the db file you cannot make shure that the first client will be served first.
What happens if a clients crashes during the write process and is not able to clean up the lock file?
What happens if two clients try to create the lock file at the same time? I think this depends on the os filesystem.

Categories

Resources