Creating a temp file is incredibly slow - java

Any reason why a call to
File.createTempFile("prefix", ".suffix", new File("C:\\");
might take 40-50 seconds to complete?
Update:
I knocked up a little test harness that benchmarks creating 100 test files on C:\ and the default tmp folder. Specifying "C:\" is consistently ~0.9ms slower than just leaving it on the default, allowing for JVM warmup time, GC pauses etc. (No clue why this should be, but its not a problem.)
Not a single run suffered from anything like that level of delay, which suggests the app is doing something else first which is causing the problem.
Using Suns JVM 1.6.0_12 client.

Time ago when developing a swing based application I came across a bug in the JVM which will cause the file requester open to be really slow if there is a big zip file on your desktop. And there is also another issue related when a big number of files exists in a folder.
Probably there can be a correlation with your issue. Which version of JDK are you usign ?
Please take a look at this thread for some info:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4638397
http://groups.google.com/group/comp.lang.java.help/browse_thread/thread/ec8854f82f381123
Defrag the disk is also a good idea.

try it:
try {
// Create temp file.
File temp = File.createTempFile("pattern", ".suffix");
// Delete temp file when program exits.
temp.deleteOnExit();
// Write to temp file
BufferedWriter out = new BufferedWriter(new FileWriter(temp));
out.write("aString");
out.close();
} catch (IOException e) {
}

I've seen file deletions on Windows take as long as a minute, but not file creation. I'd check to make sure you've defragged recently, and also that you have a reasonable number of files in your home. Once you get past 1,000 files (including hidden ones) Windows has a real hard time.
What happens if you don't specify c:\ and allow Java to place the file in it's default location?

Virus checkers can sometimes make filesystem access slow, particularly on Windows systems. They intercept all access to the filesystem and can do significant processing before allowing applications to write or read from the disk.
I'd check for and disable any virus checking software and see if that helps.

If other suggestions doesn't help (disable virusscanners and check for spyware), then I'd suggest to go get the JDK source code and run the IDE's debugger to see where it "hangs" during createTempFile().

FWIW, I ended up having to run disk cleanup.

Related

Detect java program running on linux machine

Ok, so I have a couple of Java programs that I'm running using a chron job on a linux server. These jobs run every ten minutes or so, take literally two minutes to run, and then exit. I need to add a way for the programs to detect, when they start up, if there is already an instance of themselves running, and if so to exit without going any further. I'm really not sure of the best way to handle this though and am hoping someone can offer some advice.
One approach I've considered is to run a command line argument from the java code that does some sort of PS command and looks through those to see if it's running. This seems pretty finicky and complex though for something so small. Plus, I'm not all that knowledgeable with linux and am not even sure the best way to do that. If anyone has some better thoughts, please let me know. Or if that is the best way, if you could provide the linux commands I'd need I'd appreciate it. Thanks.
If you have a writable /tmp directory you can use a lockfile.
When your Java program starts up, check for a file with a name unique to your application (e.g. "my-lock-file.lock") in the /tmp directory. If none exists, create one, and remove it when you're done. If one exists, just exit.
You can check the existence of a file with the .exists() method of the java.io.File class.
If your code needs to be portable, you can use System.getProperty("java.io.tmpdir")); to get an appropriate temporary directory for the platform your code is running on.
You could look at JMX and the Attach API to query for running JVMs.
Or, as Andrew logvinov mentioned, by using a lock file.
If you are using Java WebStart, there's already native support for this.
Many programs solve this by creating a temporary file that points to their PID (often referred to as a "lock" file). The filename should encode all relevant information to distinguish this process from other processes that could legitimately run in parallel.
For example, if the process is bound to a user, it should contain the user name. If the process is bound to a machine, it should (also) contain the hostname (if you put it in machine-bound temp. directory, this is debatable. If you put it in a home directory, think of the case of multiple machines sharing a home via NFS).
The location of these files is typically /tmp. This is a great location, as /tmp is typically wiped during system boot, so no orphan files are left in case of a system crash. Another solution employed by some programs is to put the lock file in the user settings directory, if it is related to the settings. E.g. mozilla thunderbird has a file called /home/<username>/.thunderbird/<profilename>.default/lock.
The file should contain the PID of the process. The idea is simple: If the file contains the PID, it is easy to check whether this process is indeed still running. So if the process crashes, the file gets orphaned. The new process instance will check the PID in the file, see that it is not running any more, and ignore the file (overwrite).
Putting it all together, you could create a file like this:
/tmp/myawesomeservice-username-hostname-lock
With the content:
12345

Any Java debugging tips for finding the cause of "Too Many Files Open"

I'm developing in a linux environment and the system is intended to run continuously over a long period of time. After an overnight test we see the FileNotFoundException with a message of "Too Many Files Open". We started logging the output of the lsof command at various times in the system to see if we can see what is happening. We noticed lots of unnamed pipes opened. So I figured these were due to File Streams not getting closed. I searched through the source for any *Stream objects used and made sure they were all getting closed in a finally{} block. Are there any other Java object types that I could search for that I might not be closing that would cause all these unnamed pipes to be opened?
Also, my ulimit is 1024 and I also searched for *Writer and made sure those were all closing too.
YourKit might be worth a look. Its probes are meant to help with this type of problems, although I've never had the occasion to try that functionality myself.
I'm assuming your ulimit is the output of ulimit -n. 1024 is a fairly small number of file descriptors to allow for in a production system. For a debugging step, rather than running lsof at random times and trying to correlate, why not catch the FileNotFound exception and run a Runtime.exec("lsof") and print the output to a log file to get a fairly accurate view of exactly what file descriptors were used when the problem occurred.
Other classes that might leak file descriptors are FileChannel and RandomAccessFile - the latter doesn't even seem to have a finalizer, so its leaks might be permanent.

Issue with FileWriter

I'm running my java application on a windows 2008 server (64-bit) in the hotspot vm.
A few months ago I created a tool to assist in the detection of deadlocking in my application. For the past month or so, the only thing that has been giving me any problems is the writing to text files.
The main thread always seems to get stuck on the following line for what I would assume to be almost 5 seconds at a time. After a few seconds the application continues to run normally and without problems:
PrintWriter writer = new PrintWriter(new FileWriter(PATH + name + ".txt"));
Not sure what causes this, but any insight into the problem would be most appreciated. The files I'm writing are small and that is unlikely the issue (unless anyone has any objections).
If you need any more information, please let me know.
Is PATH on a network drive? You could see almost any delay writing to a network file system. It's generally a very bad idea to do that with applications. They should generally write all their files locally and then post transactions to a server somehow.
When your file system gets overloaded, you can see delays with even the simplest of tasks. e.g. If I create a large file (multiple GB) and try to do a a simple disk access which is not cached it can wait seconds.
I would check your disk write cache is turned on and your disks are idle most of the time. ;)

java file move high performance

I am writing a media transcoding server in which I would need to move files in the filesystem and till now I am in the dilemma of whether using java renameTo can be replaced by something else that would give me better performance. I was considering using exec("mv file1 file2") but that would be my last bet.
Anyone has had similar experiences or can help me find a solution?
First of all, renameTo is likely just wrapping a system call.
Secondly, moving a file does not involve copying any data from the file itself (at least, in unix). All that happens is that the link from the old directory is removed, and a link from the new directory is added. I don't think you're going to find any performance improvements here.
I don't think that using the default methods for file has a (mentionable) performance penalty as most of this JVMtoOS functions are wrapping native calls already.
The only case where an exec would be needed is if you wanted to do something with different rights than the program or use a special tool to copy/move the file. (e.g. smart-move when ntfs-junctions are involved)
If rename is a significant performance bottleneck, then you need to improve your hardware as this is your main contraint. The software is a trivial portion of the time spent and optimising it will make little difference.
What is your disk confiugration? How is it optimised for writes?

What's the most efficient method of continually deleting files older than X hours on Windows?

I have a directory that continually fills up with "artefact" files. Many different programs dump their temporary files in this directory and it's unlikely that these programs will become self-cleaning any time soon.
Meanwhile, I would like to write a program that continually deletes files in this directory as they become stale, which I'll define as "older than 30 minutes".
A typical approach would be to have a timed mechanism that lists the files in the directory, filters on the old stuff, and deletes the old stuff. However, this approach is not very performant in my case because this directory could conceivably contain 10s or hundreds of thousands of files that do not yet qualify as stale. Consequently, this approach would continually be looping over the same thousands of files to find the old ones.
What I'd really like to do is implement some kind of directory listener that was notified of any new files added to the directory. This listener would then add those files to a queue to be deleted down the road. However, there doesn't appear to be a way to implement such a solution in the languages I program in (JVM languages like Java and Scala).
So: I'm looking for the most efficient way to keep a directory "as clean as it can be" on Windows, preferably with a JVM language. Also, though I've never programmed with Powershell, I'd consider it if it offered this kind of functionality. Finally, if there are 3rd party tools out there to do such things, I'd like to hear about them.
Thanks.
Why can't you issue a directory system command sorted by oldest first:
c:>dir /OD
Take the results and delete all files older than your threshold or sleep if no files are old enough.
Combine that with a Timer or Executor set to a granularity 1 second - 1 minute which guarantees that the files don't keep piling up faster than you can delete them.
If you don't want to write C++, you can use Python. Install pywin32 and you can then use the win32 API as such:
import win32api, win32con
change_handle = win32api.FindFirstChangeNotification(
path_to_watch,
0,
win32con.FILE_NOTIFY_CHANGE_FILE_NAME
)
Full explanation of what to do with that handle by Tim Golden here: http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html.
In Java, you can also use Apache Commons JCI FAM. It's is an opensource java library that you can use for free.
JDK 7 (released in beta currently) includes support for file notifications as well. Check out Java NIO2 tutorial.
Both options should work both on Windows and Linux.
http://www.cyberpro.com.au/Tips_n_Tricks/Windows_Related_Tips/Purge_a_Directory_in_Windows_automatically/
I'd go with C++ for a utility like this - lets you interface with the WIN32 API, which does indeed have directory listening facilities (FindFirstChangeNotification or ReadDirectoryChangesW). Use one thread that listens for change notifications and updates your list of files (iirc FFCN requires you to rescan the folder, whereas RDCW gives you the actual changes).
If you keep this list sorted according to modification time, it becomes easy to Sleep() just long enough for a file to go stale, instead of polling at some random fixed interval. You might want to do a WaitForSingleObject with a timeout instead of Sleep, in order to react to outside changes (ie, the file you're waiting for to become stale has been deleted externally, so you'll want to wake up and determine when the next file will become stale).
Sounds like a fun little tool to write :)
You might want to bite the bullet and code it up in C# (or VB). What you're asking for is pretty well handled by the FileSystemWatcher class. It would work basically the way you are describing. Register files as they are added into the directory. Have a periodic timer that scans the list of files for ones that are stale and deletes them if they are still there. I'd probably code it up as a Windows service running under a service id that has enough rights to read/delete files in the directory.
EDIT: A quick google turned up this FileSystemWatcher for Java. Commercial software. Never used it, so can't comment on how well it works.

Categories

Resources