I'm writing a Maven plugin that deletes and renames various files using the File.delete() and File.renameTo(File) JDK methods.
Roughly every second time I run the plugin, one of these operations fails, and each time it fails it's a different file that cannot be deleted or renamed. An obvious explanation for why a file cannot be deleted is that another process is using it (I'm running on Windows), but I've no idea which process might be responsible. The fact that the problem cannot be reproduced consistently suggests a threading issue, but AFAIK, Maven plugins are run in a single thread. It's difficult to get any information about the cause of the problem, because the methods referred to above don't throw exceptions, they just return false.
Is there a way to programatically detect a locked file and the name of the process holding the lock? Alternatively, if anyone has other suggestion about how to go about debugging a problem such as this one, please send them on.
Thanks,
Don
Handle can let you find out what processes have handles on files.
Sample output (it's a command line utility):
C:\Users\Jon\Downloads\Handle>handle Test.cs
Handle v3.42
Copyright (C) 1997-2008 Mark Russinovich
Sysinternals - www.sysinternals.com
Test.exe pid: 6088 190: C:\Users\Jon\Test\Test.cs
Related
One ThreadPool is downloading files from the FTP server and another thread pool is reading files from it.
Both ThreadPool are running concurrently. So exactly what happens, I'll explain you by taking one example.
Let's assume, I've one csv file with 100 records.
While threadPool-1 is downloading and writing it in a file in pending folder, and at the same time threadpool-2 reads the content from that file, but assume in 1 sec only 10 records can be written in a file in /pending folder and threadpool - 2 reads only 10 record.
ThreadPool - 2 doesn't know about that 90 records are currently in process of downloading. Now, threadPool - 2 will not read 90 records because it doesn't know that whole file is downloaded or not. After reading it'll move that file in another folder. So, my 90 records will be proceed further.
My question is, how to wait until whole file is downloaded and then only threadPool 2 can read contents from the file.
One more thing is that both threadPools are use scheduleFixedRate method and run at every 10 sec.
Please guide me on this.
I'm a fan of Mark Rotteveel's #6 suggestion (in comments above):
use a temporary name when downloading,
rename when download is complete.
That looks like:
FTP download threads write all files with some added extension – perhaps .pending – but name it whatever you want.
When a file is downloaded – say some.pdf – the FTP download thread writes the file to some.pdf.pending
When an FTP download thread completes a file, the last step is a file rename operation – this is the mechanism for ensuring only "done" files are ready to be processed. So it downloads the file to some.pdf.pending, then at the end, renames it to some.pdf.
Reader threads look for files, ignoring anything matching *.pending
I've built systems using this approach and they worked out well. In contrast, I've also worked with more complicated systems that tried to coordinate across threads and.. those often did not work so well.
Over time, any software system will have bugs. Edsger Dijkstra captured this so well:
"If debugging is the process of removing software bugs, then programming must be the process of putting them in."
However difficult it is to reason about program correctness now – while the program is still in design phase,
and has not yet been built – it will be harder to reason about correctness when things are broken in production (which will happen, because bugs).
That is, when things are broken and you're under time pressure to find the root cause (and fix it!), even the best of us would be at a disadvantage
with a complicated (vs. simple) system.
The approach of using temporary names is simple to reason about, which should minimize code complexity and thus make it easier to implement.
In turn, maintenance and bug fixes should be easier, too.
Keep it simple – let the filesystem help you out.
Can I somehow check if another program reads a specified file?
I want my program to monitor a file, and whenever it is accessed by another program, it should run some code. Is this possible?
As some people have mentionned, Java new IO offers you to watch a directory/files for some activities :
ENTRY_CREATE – A directory entry is created.
ENTRY_DELETE – A directory entry is deleted.
ENTRY_MODIFY – A directory entry is modified.
OVERFLOW – Indicates that events might have been lost or discarded. You do not have to register for the OVERFLOW event to receive it.
However, as you can see, it does not allow you to detect if the file has been accessed. If really you want to do that, you will have to write some native code.
On windows, you can list who access a file using Handle. I believe you could call this command from a java program (let say each couple of minutes) then from the output detect if the file is used.
I'm pretty sure there is alternative for other OS.
BasicFileAttributes interface offers last access time. But it wont be able to tell you which program accessed it. As mentioned by others WatchService will also do the same but what you want to do can be achieved via logging from those programs and then check those logs for determining what to do next.
Ok, so I have a couple of Java programs that I'm running using a chron job on a linux server. These jobs run every ten minutes or so, take literally two minutes to run, and then exit. I need to add a way for the programs to detect, when they start up, if there is already an instance of themselves running, and if so to exit without going any further. I'm really not sure of the best way to handle this though and am hoping someone can offer some advice.
One approach I've considered is to run a command line argument from the java code that does some sort of PS command and looks through those to see if it's running. This seems pretty finicky and complex though for something so small. Plus, I'm not all that knowledgeable with linux and am not even sure the best way to do that. If anyone has some better thoughts, please let me know. Or if that is the best way, if you could provide the linux commands I'd need I'd appreciate it. Thanks.
If you have a writable /tmp directory you can use a lockfile.
When your Java program starts up, check for a file with a name unique to your application (e.g. "my-lock-file.lock") in the /tmp directory. If none exists, create one, and remove it when you're done. If one exists, just exit.
You can check the existence of a file with the .exists() method of the java.io.File class.
If your code needs to be portable, you can use System.getProperty("java.io.tmpdir")); to get an appropriate temporary directory for the platform your code is running on.
You could look at JMX and the Attach API to query for running JVMs.
Or, as Andrew logvinov mentioned, by using a lock file.
If you are using Java WebStart, there's already native support for this.
Many programs solve this by creating a temporary file that points to their PID (often referred to as a "lock" file). The filename should encode all relevant information to distinguish this process from other processes that could legitimately run in parallel.
For example, if the process is bound to a user, it should contain the user name. If the process is bound to a machine, it should (also) contain the hostname (if you put it in machine-bound temp. directory, this is debatable. If you put it in a home directory, think of the case of multiple machines sharing a home via NFS).
The location of these files is typically /tmp. This is a great location, as /tmp is typically wiped during system boot, so no orphan files are left in case of a system crash. Another solution employed by some programs is to put the lock file in the user settings directory, if it is related to the settings. E.g. mozilla thunderbird has a file called /home/<username>/.thunderbird/<profilename>.default/lock.
The file should contain the PID of the process. The idea is simple: If the file contains the PID, it is easy to check whether this process is indeed still running. So if the process crashes, the file gets orphaned. The new process instance will check the PID in the file, see that it is not running any more, and ignore the file (overwrite).
Putting it all together, you could create a file like this:
/tmp/myawesomeservice-username-hostname-lock
With the content:
12345
One of our clients is using some Novel security software that sometimes locks some .class files that our software creates. This causes some nasty problems for them when this occurs and I am trying to research a workaround that we could add to our error handling to address this problem when it comes up. I am wondering are there any calls in the java api that can be used to detect if a file is locked, and if it is, unlock it.
Before attempting to write to a file, you can check if the file is writable by your java application using File.canWrite(). However, you still might run into an issue if the 3rd party app locks the file in between your File.canWrite() check and when your application actually attempts to write. For this reason, I would code your application to simply go ahead and try to write to the file and catch whatever Exception gets thrown when the file is locked. I don't believe there is a native Java way to unlock a file that has been locked by another application. You could exec a shell command as a privileged user to force things along but that seems inelegant.
File.canWrite() has the race condition that Asaph mentioned. You could try FileChannel.lock() and get an exclusive lock on the file. As long as the .class is on your local disk, this should work fine (file locking can be problematic on networked disks).
Alternatively, depending on how the .class name is discovered, you could create a new name for your .class each time; then if the anti-virus software locks your initial class, you can still create the new one.
I have a directory that continually fills up with "artefact" files. Many different programs dump their temporary files in this directory and it's unlikely that these programs will become self-cleaning any time soon.
Meanwhile, I would like to write a program that continually deletes files in this directory as they become stale, which I'll define as "older than 30 minutes".
A typical approach would be to have a timed mechanism that lists the files in the directory, filters on the old stuff, and deletes the old stuff. However, this approach is not very performant in my case because this directory could conceivably contain 10s or hundreds of thousands of files that do not yet qualify as stale. Consequently, this approach would continually be looping over the same thousands of files to find the old ones.
What I'd really like to do is implement some kind of directory listener that was notified of any new files added to the directory. This listener would then add those files to a queue to be deleted down the road. However, there doesn't appear to be a way to implement such a solution in the languages I program in (JVM languages like Java and Scala).
So: I'm looking for the most efficient way to keep a directory "as clean as it can be" on Windows, preferably with a JVM language. Also, though I've never programmed with Powershell, I'd consider it if it offered this kind of functionality. Finally, if there are 3rd party tools out there to do such things, I'd like to hear about them.
Thanks.
Why can't you issue a directory system command sorted by oldest first:
c:>dir /OD
Take the results and delete all files older than your threshold or sleep if no files are old enough.
Combine that with a Timer or Executor set to a granularity 1 second - 1 minute which guarantees that the files don't keep piling up faster than you can delete them.
If you don't want to write C++, you can use Python. Install pywin32 and you can then use the win32 API as such:
import win32api, win32con
change_handle = win32api.FindFirstChangeNotification(
path_to_watch,
0,
win32con.FILE_NOTIFY_CHANGE_FILE_NAME
)
Full explanation of what to do with that handle by Tim Golden here: http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html.
In Java, you can also use Apache Commons JCI FAM. It's is an opensource java library that you can use for free.
JDK 7 (released in beta currently) includes support for file notifications as well. Check out Java NIO2 tutorial.
Both options should work both on Windows and Linux.
http://www.cyberpro.com.au/Tips_n_Tricks/Windows_Related_Tips/Purge_a_Directory_in_Windows_automatically/
I'd go with C++ for a utility like this - lets you interface with the WIN32 API, which does indeed have directory listening facilities (FindFirstChangeNotification or ReadDirectoryChangesW). Use one thread that listens for change notifications and updates your list of files (iirc FFCN requires you to rescan the folder, whereas RDCW gives you the actual changes).
If you keep this list sorted according to modification time, it becomes easy to Sleep() just long enough for a file to go stale, instead of polling at some random fixed interval. You might want to do a WaitForSingleObject with a timeout instead of Sleep, in order to react to outside changes (ie, the file you're waiting for to become stale has been deleted externally, so you'll want to wake up and determine when the next file will become stale).
Sounds like a fun little tool to write :)
You might want to bite the bullet and code it up in C# (or VB). What you're asking for is pretty well handled by the FileSystemWatcher class. It would work basically the way you are describing. Register files as they are added into the directory. Have a periodic timer that scans the list of files for ones that are stale and deletes them if they are still there. I'd probably code it up as a Windows service running under a service id that has enough rights to read/delete files in the directory.
EDIT: A quick google turned up this FileSystemWatcher for Java. Commercial software. Never used it, so can't comment on how well it works.