I'm running my java application on a windows 2008 server (64-bit) in the hotspot vm.
A few months ago I created a tool to assist in the detection of deadlocking in my application. For the past month or so, the only thing that has been giving me any problems is the writing to text files.
The main thread always seems to get stuck on the following line for what I would assume to be almost 5 seconds at a time. After a few seconds the application continues to run normally and without problems:
PrintWriter writer = new PrintWriter(new FileWriter(PATH + name + ".txt"));
Not sure what causes this, but any insight into the problem would be most appreciated. The files I'm writing are small and that is unlikely the issue (unless anyone has any objections).
If you need any more information, please let me know.
Is PATH on a network drive? You could see almost any delay writing to a network file system. It's generally a very bad idea to do that with applications. They should generally write all their files locally and then post transactions to a server somehow.
When your file system gets overloaded, you can see delays with even the simplest of tasks. e.g. If I create a large file (multiple GB) and try to do a a simple disk access which is not cached it can wait seconds.
I would check your disk write cache is turned on and your disks are idle most of the time. ;)
Related
One ThreadPool is downloading files from the FTP server and another thread pool is reading files from it.
Both ThreadPool are running concurrently. So exactly what happens, I'll explain you by taking one example.
Let's assume, I've one csv file with 100 records.
While threadPool-1 is downloading and writing it in a file in pending folder, and at the same time threadpool-2 reads the content from that file, but assume in 1 sec only 10 records can be written in a file in /pending folder and threadpool - 2 reads only 10 record.
ThreadPool - 2 doesn't know about that 90 records are currently in process of downloading. Now, threadPool - 2 will not read 90 records because it doesn't know that whole file is downloaded or not. After reading it'll move that file in another folder. So, my 90 records will be proceed further.
My question is, how to wait until whole file is downloaded and then only threadPool 2 can read contents from the file.
One more thing is that both threadPools are use scheduleFixedRate method and run at every 10 sec.
Please guide me on this.
I'm a fan of Mark Rotteveel's #6 suggestion (in comments above):
use a temporary name when downloading,
rename when download is complete.
That looks like:
FTP download threads write all files with some added extension – perhaps .pending – but name it whatever you want.
When a file is downloaded – say some.pdf – the FTP download thread writes the file to some.pdf.pending
When an FTP download thread completes a file, the last step is a file rename operation – this is the mechanism for ensuring only "done" files are ready to be processed. So it downloads the file to some.pdf.pending, then at the end, renames it to some.pdf.
Reader threads look for files, ignoring anything matching *.pending
I've built systems using this approach and they worked out well. In contrast, I've also worked with more complicated systems that tried to coordinate across threads and.. those often did not work so well.
Over time, any software system will have bugs. Edsger Dijkstra captured this so well:
"If debugging is the process of removing software bugs, then programming must be the process of putting them in."
However difficult it is to reason about program correctness now – while the program is still in design phase,
and has not yet been built – it will be harder to reason about correctness when things are broken in production (which will happen, because bugs).
That is, when things are broken and you're under time pressure to find the root cause (and fix it!), even the best of us would be at a disadvantage
with a complicated (vs. simple) system.
The approach of using temporary names is simple to reason about, which should minimize code complexity and thus make it easier to implement.
In turn, maintenance and bug fixes should be easier, too.
Keep it simple – let the filesystem help you out.
Problem Statement: FTP server is flooded with files coming at the rate of 100 Mbps(ie. 12.5 MB/s) each file size is 100 MB approx. Files will be deleted after 30 sec from their creation time stamp. If any process is interested to read those files it should take away complete file in less then 30 sec. I am using Java to solve this particular problem.
Need suggestion on which Design pattern would be best suited for this kind of problem. How would I make sure that the each file will be consumed before server delete it.
Your suggestion will be greatly appreciated. Thanks
If the Java application runs on the same machine as the FTP service, then it could use File.renameTo(File) or equivalent to move a required file out of the FTP server directory and into another directory. Then it can process it at its leisure. It could use a WatchService to monitor the FTP directory for newly arrived files. It should watch for events on the directory, and when a file starts to appear it should wait for the writes to stop happening. (Depending on the OS, you may or may not be able to move the file while the FTP service is writing to it.)
There is a secondary issue of whether a Java application could keep up with the required processing rate. However, if you had multiple cores and multiple worker threads, then your app could potentially process them in parallel. (It depends on computationally and/or I/O intensive the processing is. And the rate at which a Java thread can read a file ... which will be OS and possibly hardware dependent.)
If the Java application is not running on the FTP server, it would probably need to use FTP to fetch the file. I am doubtful that you could implement something to do that consistently and reliably; i.e. without losing files occasionally.
I've wrote a little program for my work (IT-Support at university) to record all all inquiries received and evaluate them later.
Because more than one receives inquiries at the same time the program must run on more than on pc simultaneously.
At the moment we are not able to have our own server (e.g. MySQL) so my solution was to write a program that's saving the inquiries to a single file on a fileserver. The file is opened with myStatFile = new File(myFilepath + "/data/statistic.dat"); and wrote to with in = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(myFile, true), "UTF-8"));. It works pretty well but as the statistic.dat file becomes larger and larger the program needs more time to open the file, write to it and close it.
So my question to the more experienced programmers here is: Is there a more efficient way to do that or is the only possibility to improve the performance to get a server with a database like MySQL?
Does anyone see another way without using a server but a simple fileserver?
Edit: First of all I thank everyone who answered my question. I know it's a very broad question with a lot of solutions but after reading your answers I've got an idea of what is possible and what's the way to go now. Thank you!
You have set the second parameter to true in your FileOutputStream constructor. That instructs the stream to append new data to the file.
This does not slow down as the file gets larger. Both the Windows and Linux / BSD / Unix heritage operating systems handle file-appending very efficiently indeed. You should open, write, and close as quickly as you can. Don't hold the file open longer than you must.
You can arrange for your Java program to lock the file while you're writing it. That will prevent multiple users (you and your co-workers) from potentially trying to append simultaneously. See here. How can I lock a file using java (if possible)
You still should consider a database for this application, but not because appending records to files is slow.
By the way, there are several good and free (free as in speech, free as in kittens) support desk ticketing web applications available.
I would certainly insist on using a database server. Your application isn't very scalable, otherwise.
To Ollie's point, there are a few free options for support desk web apps out there. I've used osticket and it's amazing how versatile it is.
Any reason why a call to
File.createTempFile("prefix", ".suffix", new File("C:\\");
might take 40-50 seconds to complete?
Update:
I knocked up a little test harness that benchmarks creating 100 test files on C:\ and the default tmp folder. Specifying "C:\" is consistently ~0.9ms slower than just leaving it on the default, allowing for JVM warmup time, GC pauses etc. (No clue why this should be, but its not a problem.)
Not a single run suffered from anything like that level of delay, which suggests the app is doing something else first which is causing the problem.
Using Suns JVM 1.6.0_12 client.
Time ago when developing a swing based application I came across a bug in the JVM which will cause the file requester open to be really slow if there is a big zip file on your desktop. And there is also another issue related when a big number of files exists in a folder.
Probably there can be a correlation with your issue. Which version of JDK are you usign ?
Please take a look at this thread for some info:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4638397
http://groups.google.com/group/comp.lang.java.help/browse_thread/thread/ec8854f82f381123
Defrag the disk is also a good idea.
try it:
try {
// Create temp file.
File temp = File.createTempFile("pattern", ".suffix");
// Delete temp file when program exits.
temp.deleteOnExit();
// Write to temp file
BufferedWriter out = new BufferedWriter(new FileWriter(temp));
out.write("aString");
out.close();
} catch (IOException e) {
}
I've seen file deletions on Windows take as long as a minute, but not file creation. I'd check to make sure you've defragged recently, and also that you have a reasonable number of files in your home. Once you get past 1,000 files (including hidden ones) Windows has a real hard time.
What happens if you don't specify c:\ and allow Java to place the file in it's default location?
Virus checkers can sometimes make filesystem access slow, particularly on Windows systems. They intercept all access to the filesystem and can do significant processing before allowing applications to write or read from the disk.
I'd check for and disable any virus checking software and see if that helps.
If other suggestions doesn't help (disable virusscanners and check for spyware), then I'd suggest to go get the JDK source code and run the IDE's debugger to see where it "hangs" during createTempFile().
FWIW, I ended up having to run disk cleanup.
My boss is worried that our NFS file system will not be happy with the jboss run java process calling getFD().sync on the files we are writing.
We have noticed that frequently the time stamp on the created file is minutes (sometimes as many as 15 minutes) after the log claims the file was finished writing. My only guess is that the NFS is hanging on to the file in memory and not writing it till it feels like it. sync should solve that probelm, right?
I also noticed that there is never a close() called on the file. Wondering if that could have been the cause as well?
any thoughts appreciated.
If you mean that the Java code never calls close() on the stream, yes, that is a bug. Always close a stream, input or output, as soon as use is complete. Good static analysis tools will warn about code that fails to do this.