Are methods Files.createFile() and Files.delete() thread safe? I have read in documentation that createFile() always an atomic operation but delete() is not. Should I somehow synchronize these blocks in my Java application and how? What atomic operation means for multihreading task?
a. What atomic operation means for multihreading task?
In context of multi-threading atomicity is the ability of a thread to execute a task in such a manner so that other threads have apparently no side-effect over the state varibles of that task when it was being executed by this thread.
File.createNewFile() :- For this method the state is existence or non existence of the file, when the thread was about to execute this method. Lets say that when this method was being executed by the thread the file did not exist. Now lets say that this method takes 5 ms of time to execute and create the file. So according to the concept of Atomicity no other thread should be able to create the same file(which was not existing before) during these 5ms otherwise the very first assumption of this thread about the state of the file will change and hence the output.
So in this case the executing thread does-this by obtaining a write lock over the directory where file is to be created.
Files.delete():- The Java doc for this method says
this method may not be atomic with respect to
other file system operations. If the file is a symbolic link, then the
symbolic link itself, not the final target of the link, is deleted.
the above statement says that this operation is also atomic but in case if this method is invoked on a symbolic link, the link is deleted and not the file. Which implies that the original file exists and file system operations on that file are feasible by other threads.
to determine if a file is a symlink see the reference:-
determine symlink
b. Should I somehow synchronize these blocks in my Java application and how?
You need not handle any multi-threading scenarios in both the cases.
However you can use the method mentioned in the link above to determine symlinks and handle that separately as you would wish.
But no synchronization is required from your end for sure.
Do you mean File.createNewFile()?
Javadoc says:
The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
With other words, between the check if the file exist and the creation of the file will be no other file operation, changing the existence of the file.
If two threads want to create the same non existing file, only one will create the file and return true. The other thread will return false.
Usually you dont need to synchronize these operations but do a proper exception handling. Maybe other programs operate on your files too.
Related
I have a function, that's purpose is to create a directory and copy a csv file to that directory. This same function gets ran multiple times, each time by an object in a different thread. It gets called in the object's constructor, but I have logic in there to only copy the file if it does not already exist (meaning, it checks to make sure that one of the other instances in parallel did not already create it).
Now, I know that I could simply rearrange the code so that this directory is created and the file is copied before the objects are ran in parallel, but that is not ideal for my use case.
I am wondering, will the following code ever fail? That is, due to one of the instances being in the middle of copying a file, while another instance attempts to start copying that same file to the same location?
private void prepareGroupDirectory() {
new File(outputGroupFolderPath).mkdirs();
String map = "/path/map.csv"
File source = new File(map);
String myFile = "/path/test_map.csv";
File dest = new File(myFile);
// copy file
if (!dest.exists()) {
try{
Files.copy(source, dest);
}catch(Exception e){
// do nothing
}
}
}
To sum it all up. Is this function thread-safe in the sense that, different threads could all run this function in parallel without it breaking? I think yes, but any thoughts would be helpful!
To be clear, I have tested this many many times and it has worked every time. I am asking this question to make sure, that in theory, it will still never fail.
EDIT: Also, this is highly simplified so that I could ask the question in an easy to understand format.
This is what I have now after following comments (I still need to use nio instead), but this is currently working:
private void prepareGroupDirectory() {
new File(outputGroupFolderPath).mkdirs();
logger.info("created group directory");
String map = instance.getUploadedMapPath().toString();
File source = new File(map);
String myFile = FilenameUtils.getBaseName(map) + "." + FilenameUtils.getExtension(map);
File dest = new File(outputGroupFolderPath + File.separator + "results_" + myFile);
instance.setWritableMapForGroup(dest.getAbsolutePath());
logger.info("instance details at time of preparing group folder: {} ", instance);
final ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
// copy file
if (!dest.exists()) {
String pathToWritableMap = createCopyOfMap(source, dest);
logger.info(pathToWritableMap);
}
} catch (Exception e) {
// do nothing
// thread-safe
} finally {
lock.unlock();
}
}
It isn't.
What you're looking for is the concept of rotate-into-place. The problem with file operations is that almost none of it is atomic.
Presumably you don't just want 'only one' thread to win the race for making this file, you also want that file to either be perfect, or not exist at all: You would not want anybody to be able to observe that CSV file in a half-baked state, and you most certainly wouldn't want a crash halfway through generating the CSV file to mean that the file is there, half-baked, but its mere existence means it prevents any attempt to write it out properly. You can't use finally blocks or exception catching to address this issue; someone might trip over a powercable.
So, how do you solve all these problems?
You do not write to foo.csv. Instead you write to foo.csv.23498124908.tmp where that number is randomly generated. Because that just isn't the actual CSV file anybody is looking for, you can take all the time in the world to finish it properly. Once it is done, then you do the magic trick:
You rename foo.csv.23498124908.tmp into foo.csv, and do so atomically - one instant in time foo.csv does not exist, the next instant in time it does and it has the complete contents. Also, that rename will only succeed if the file didn't exist before: It is impossible for two separate threads to both rename their foo.csv.23481498.tmp file into foo.csv simultaneously. If you were to try it and get the timing just perfect, one of them (arbitrary which one) 'wins', the other one gets an IOException and doesn't rename anything.
The way to do this is using Files.move(from, to, StandardCopyOptions.ATOMIC_MOVE). ATOMIC_MOVE is even kind enough to flat out refuse to execute if somehow the OS/filesystem combination simply does not support ATOMIC_MOVE (they pretty much all do, though).
The second advantage is that this locking mechanism works even if you have multiple entirely different apps running. If they all use ATOMIC_MOVE or the equivalent of this in that language's API, only one can win, whether we're talking 'threads in a JVM' or 'apps on a system'.
If you want to instead avoid the notion that multiple threads are both simultaneously doing the work to make this CSV file even though only one should do so and the rest should 'wait' until the first thread is done, file system locks are not the answer - you can try (make an empty file whose existence is a sign that some other thread is working on it) - and there's even a primitive for that in java's java.nio.file APIs. The CREATE_NEW flag can be used when creating a file, which means: Atomically create it, failing if the file already exists with concurrency guarantees (if multiple processes/threads all run that simultaneously, one succeeds and all others fail, guaranteed). However, CREATE_NEW can only atomically create. It cannot atomically write, nothing can (hence the whole 'rename it into place' trick above).
The problem with such locks are two fold:
If the JVM crashes that file doesn't go away. Ever launched a linux daemon process, such as postgresd, and it told you that 'the pid file is still there, if there is no postgres running please delete it'? Yeah, that problem.
There's no way to know when it is done, other than to just re-check for that file's existence every few milliseconds. If you wait very few milliseconds you're trashing the disk potentially (hopefully your OS and disk cache algorithms do a decent job). If you wait a lot you might be waiting around for no reason for a long time.
Hence why you shouldn't do this stuff, and just use locks within the process. Use synchronized or make a new java.util.concurrent.ReentrantLock or whatnot.
To answer your code snippet specifically, no that is broken: It is possible for 2 threads to run simultaneously and both get false when it runs dest.exists(), thus both entering the copy block, and then they fall all over each other when copying - depending on file system, usually one thread ends up 'winning', with their copy operation succeeding and the other thread's seemingly lost to the aether (most file systems are ref/node based, meaning, the file was written to disk but its 'pointer' was immediately overwritten, and the filesystem considers it garbage, more or less).
Presumably you consider that a failing scenario, and your code does not guarantee that it can't happen.
NB: What API are you using? Files.copy(instanceOfJavaIoFile, anotherInstanceOfJavaIoFile) isn't java. There is java.nio.file.Files.copy(instanceOfjnfPath, anotherInstanceOfjnfPath) - that's the one you want. Perhaps this Files you have is from apache commons? I strongly suggest you don't use that stuff; those APIs are usually obsolete (java itself has better APIs to do the same thing), and badly designed. Ditch java.io.File, it's outdated API. Use java.nio.file instead. The old API doesn't have ATOMIC_MOVE or CREATE_NEW, and doesn't throw exceptions when things go wrong - it just returns false which is easily ignored and has no room to explain what went wrong. Hence why you should not use it. One of the major issues with the apache libraries is that it uses the anti-pattern of piling a ton of static utility methods into a giant container. Unfortunately, the second take on file stuff in java itself (java.nio.file) is similarly boneheaded API design. I guess in the java world, third time will be the charm. At any rate, a bad core java API with advanced capabilities is still a better than a bad apache utility API that wraps around the older API which simply does not expose the kinds of capabilities you need here.
I have a sinlgeton object which holds one method, which is NOT synchronized. The singleton can be accessed by many clients at a time - what will happen if multiple clients access that object ?
Actually I want to write a log in a single file using that method.
I guess by clients, you mean threads. Assuming you have implemented singleton correctly, all threads would be using the same instance. Since this is a method that changes state (writing to a file), it would require in general require some sort of synchronization. Although it depends on some factors - for example, if your method writes just a single line in a single call to BufferedWriter.write(), it is fine. Because BufferefWriter.write() does synchronization internally. However, if you write multiple lines or make multiple calls to BufferedWriter.write(), the different calls might execute out of order.
Now, if by clients you mean different processes, simple synchronization of course will not help. You can use FileLock to lock the file if the processes are in the same JVM. Otherwise, you have to lock using something external, such as use another temp file as lock. It depends on the OS though if it provides atomic file creates.
Process A writes in a file XYZ, when executed. There are processes B and C, which when executed, reads the file XYZ. So, while process A is up, B and C should wait for A to complete. To provide synchronization can I use java.nio package? or I should use something like FileLock or sockets? Can we mention the time to wait for the second process to wait?
Edited: The file is created during the first write process. In such case, can I make it shared resource?
Using java.nio package's file lock could be a better solution, I hope. But, I think java.nio is not full-fledged till JDK 1.6.
http://www.withoutbook.com/DifferenceBetweenSubjects.php?subId1=7&subId2=43&d=Java%206%20vs%20Java%207
FileLock:
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileLock.html
One way could be the usage of a flag. Just a boolean stillWriting which is readable from outside.
As soon process A did its Job, this flag is set to false and your processes B/C can start their work with this file.
Assuming A wants to start again editing this file, it'll set this flag back to true and block the other two processes.
Using locks would be a good idea. You can use Conditions from JavaAPi.
Refer to [http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html#awaitNanos(long)][1]
When A is working it should signal the thread to await and then on completion it can signal so that other thread waiting to start can proceed. Also this is very appropriate when we use shared resource.
I read that new File("path") doesn't physically create a file on disk. Though in the API it is said:
Instances of this class may or may not denote an actual file-system
object such as a file or a directory. If it does denote such an object then that object resides in a partition. A partition is an operating system-specific portion of storage for a file system. A single storage device (e.g. a physical disk-drive, flash memory, CD-ROM) may contain multiple partitions.
So I'm curious if it is safe to have such code in multithreaded environment:
File file = new File( "myfile.zip" );
// do some operations with file (fill it with content)
file.saveSomewhere(); // just to denote that I save it after several operations
For example, thread1 comes here, creates an instance and starts doing operations. Meanwhile thread2 interrupts it, creates its instance with the same name (myfile.zip) and does some other operations. After that they consequently save the file.
I need to be sure that they don't work with the same file and the last thread saving the file overwrites the previous one.
No, File does not keep a lock and is not safe for the pattern you describe. You should either lock or keep the file in some wrapper class.
If you would provide a little bit more of your code, somebody can certainly help you find a suitable pattern.
Certainly the lines you commented will not thread-safe, you will have to protect them with a mutex or a monitor. The gold rule is: every time you have to write on something in a multithread context, it's necessary to protect that region to grant the thread-safeness (Bernstein conditions).
I'm not sure if the statement you propose requires to be protected too as i never used that command, but i thought this could be helpful to someone else too.
I have three services in my Android app that are fired by two broadcast receivers. The first two write onto a file and are fired by one broadcast receiver so I can make sure that they are executed one after the other (via Context.sendOrderedBroadcast()). The third one is on its own and is fired by a separate broadcast receiver, but reads from the same file that the first two write on.
Because the broadcast receivers may be fired at the same time or nearly the same time as each other, the file might also be accessed concurrently. How can I prevent that from happening? I want to be able to either read first then write or write then read. I'm just not sure if this problem is similar to Java concurrency in general because android services, if I'm not mistaken, are an entirely different beast.
One solution would be to have your writing tasks create an empty temporary file (say .lock) before accessing the shared file and delete that same temporary file once they are done.
Your reading task can check whether .lock file exists or not.
Alternatively, you can use a FileLock.
http://developer.android.com/reference/android/app/Service.html
Note that services, like other application objects, run in the main thread of their hosting process. This means that, if your service is going to do any CPU intensive (such as MP3 playback) or blocking (such as networking) operations, it should spawn its own thread in which to do that work.
I suggest to read from/write to file in separate thread. You can use Only one thread at a time! for doing it in the same thread.
First of all, I shouldn't have done the file I/O in the main UI thread which is the case with Services. It should be done in another thread, like an AsyncTask.
Secondly, the ReentrantLock method is so much easier. When locked, it tells the other threads accessing the same resource to wait, and proceed only when the lock has been released. Simply instantiate a new ReentrantLock() and share that lock among the methods that read to or write from the file. It's as easy as calling lock() and unlock() on the ReentrantLock as you need it.