Do I need temp file when ftp retrieving files? - java

I am using ftpClient.retrieveFile() to download files from an FTP Server while another thread is scanning the directory constantly for files to process. I am wondering if this is dangerous? Could it be that a file is not finished downloading and is processed by the other thread? Should I be using a .temp suffix to save the temporary file and rename it after the transaction is finished?

Files are, in general, visible to other processes or threads as soon as you create them. So your second thread could see and process a file before you have completed writing to it. The correct practice is to use either a temporary extension (like the .temp you have mentioned) or a temporary directory.
In your case, the most appropriate thing to do would be to use some synchronization mechanism so that the second thread blocks when there are no files to process and the first thread notifies the second when a file finishes downloading. Java supports these operations with the wait() and notify() methods of the Object class.

Related

What would be best practice If I am trying to constantly check if a directory exists? JAVA

I have a Java application that creates multiple threads. There is 1 producer thread which reads from a 10gb file, parses that information, creates objects from it and puts them into multiple blocking queues (5 queues).
The rest of the 5 consumer threads read from a blockingqueue (each consumer thread has its own blockingqueue). The consumer threads then each write to an individual file, so 5 files in total get created. It takes around 30min to create all files.
The problem:
The threads are writing to an external mount directory in a linux box. We've experience problems where other linux mounts have gone down and applications crash so I want to prevent that in this application.
What I would like to do is keep checking if the mount (directory) exists before writing to it. Im assuming if the directory goes down it will throw a FileNotFoundException. If that is the case I want it to keep checking if the directory is there for about 10-20min before completely crashing. Because I dont want to have to read the 10gb file again I want the consumer threads to be able to pick up from where they last left off.
What Im not sure would be best practice is:
Is it best to check if the directory exists in the main class before creating the threads? Or check in each consumer thread?
If I keep checking if the directory exists in each consumer thread it seems like repeatable code. I can check in the main class but it takes 30min to create these files. What if in those 30min the mount goes down then if Im only checking in the main class whether the directory exists the application will crash. Or if Im already writing to a directory is it impossible for an external directory to go down? Does it get locked?
thank you
We have something similar in our application, but in our case we are running a web app and if our mounted file system goes down we just throw an exception, but we want to do something more elegant, like you do...
I would recommend using a combination of the following patterns: State, CircuitBreaker, which I believe CircuitBreaker is a more specific version of the State pattern, and Observer/Observable.
These would work in the following way...
Create something that represents your file system. Maybe a class called MountedFileSystem. Make all your write calls to this particular class.
This class will catch all FileNotFoundException and one occurs, the CircutBreaker gets triggered. This change will be like the State pattern. One state is when things are working 'fine', the other state is when things aren't working 'fine', meaning that the mount has gone away.
Then, in the background, I would have a task that starts on a thread and checks the actual underlying file system to see if it is back. When the file system is back, change the state in the MountedFileSystem, and fire an Event (Observer/Observable) to try writing the files again to disk.
And as yuan quigfei stated, I am fairly certain you're going to have to rewrite those files. I just don't see being able to restart writing to them, but perhaps someone else has an idea.
write a method to detect folder exist or not.
call this method before actual writing.
create 5 thread based on 2. Once detect file is not existed, you seems have no choice but rewrite. Of course, you don't need re-read if all your content are in memory already(Big memory).

Multithreaded access to files in Java

I'm working on a multithreaded server in Java.
The server monitors a directory of files. Clients can ask the server:
to download a file from the server directory
to upload a new version of an already existing file to the server, overwriting the old version in the server directory.
To do the transfers, I'm planning to use FileChannels and SocketChannels, using the methods transferFrom and transferTo. According to the documentation, these two methods are thread safe.
The thing is that a single call to these two function could not be sufficient to read/write the file entirely.
The problem arises if there are more than one request on the same file at the same time. In this scenario, multiple threads could be doing read/write operations on the same file. Now, the single calls to transferFrom/transferTo are thread safe, according to the Java documentation. But a single call to these two functions could not be sufficient to read/write the file entirely. If thread A is replying to a download request and thread B is replying to an upload request referring to the same file, it could happen that:
Thread A starts reading from the file
In thread A, for some reason the read call returns before the EOF
Thread B overwrites the entire file with a single write call
Thread A continues reading from the file
In this case, the downloading client receives a portion of the old version and a portion of the new version.
To solve this I think I should be using some sort of locking, but I'm not sure how to do it in an efficient way. I could create two synchronized methods for reading and writing, but that creates obviously too much contention.
The best solution I have in mind is to use lock striping. Before doing any read/write operation, an hash based on the filename is calculated. Then, the lock in position lockArr[hash % numOfLocks] is acquired.
I think also that I should be using ReadWriteLocks, since multiple simultaneous reads should be allowed.
Now, this is my analysis of the problem and I could be completely wrong. Is there any better solution to this?
Locking means that somebody has to wait for somebody else -- not the best solution.
When the client uploads a file, you should write it out to a temp file on the same disk (usually in the same directory), and then when the file upload is done:
Rename the old version to a temporary name. Any current readers should be forced to close the old one, re-open the temp version, and seek to the correct position.
Rename the uploaded file to the target file name.
Delete the temp version of the old file when any readers are done with it.
In a typical implementation, you'd need a centralized class (lets call it ConcurrentFileAccessor) to manage the interactions between threads.
Readers would need to register with this class, and synchronize on some object during the actual read operation. When an upload completes, the writer would have to claim all those locks to block reads, close all the read files, rename the old version, reopen, seek, and then release them to allow the readers to continue.

Apache POI gets java.io.IOException on tmp dir in multithread

I have a java application that gets a request to create an XLSX file.
this application is multi-threaded which means that 5 users simultaneously can run a report.
my issue is that when the report is huge and 5 users create reports together i get this message java.io.IOException: Could not create temporary directory '
this is probably caused because one of the 5 threads deleted the java.tmp.dir and the other 4 threads failed.
how do i resolve that?
one of my suggested solutions is to give each thread a different java.io.tmpdir, is that something that can be done?
One solution will be while creating temp directory then thread should append some prefix to identify uniquely .So there will be no concurrent modification to same folder.
While implementation you have to consider how many request can simultaneously process.You can not create lot of directory.
One solution will be using thread pool and a queue to hold request if request is coming more than you can process.
or If there is similarity in content then you can create a template and change some data dynamically.So only clone will work
I may first check if your methods, in relation to write those .xlsx files, are thread safe.
And your theads may race to write the same files concurrently.

Is file creation process safe among different processes at os level (ubuntu)?

I have two java application which works on some file exist check mechanism , where one application wait till file deletion occurs and create a file on deletion of file to manage concurrency. If the process are not process safe my application fails.
The pseudocode:
if file exists:
do something with it
It's not concurrent safe as nothing ensures the file does not get deleted between the first and the second line.
The safest way would be to use a FileLock. If you are planning to react to file creation/deletion events on Linux, I'd recommend to use some inotify based solution.

How can I safely read Log4j logs while another thread is logging at the same time?

If I have multiple threads that use log4j to write to a single log file, and I want another thread to read it back out, is there a way to safely read(line by line) those logs such that I always read a full line?
EDIT:
Reason for this is I need to upload all logs to a central location and it might be logs that are days old or those that are just being written
You should use a read write lock.
Read locks can be held by multiple users if there is no one writing to the file, but a write lock can only be held by 1 thread at a time no matter what.
Just make sure that as your writing thread is done writing, it releases the readwritelock to allow the reading threads to read. Likewise, always release the read lock when the reader are done reading so log4j can continue to write
Check out
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html
However, coming to think of it, what is your purpose for this? If you simply want to monitor your logs, you should use a different solution rather than having a monitor thread within the same application. Seems to not make sense. If the data is available within the application / service, why pass it off to a file and read it right back in?
It is going to be a pain if you need to implement what you are doing, especially you have to deal with file rolling.
For your specific requirement, there are better choices:
If the location you are going to be backed up can be directly written (i.e. mounted in your file system), it is better to simply set your file rolling to write to that backup directory; or
Make use of log management tools like Splunk to monitor and manage your log files (so that you don't even need to copy to that backup directory); or
Even you need to do the backup all by yourself, you don't need to (and have no reason to) do it in a separate thread. Trying to write a shell script monitoring your log directory, and make use of tools like rsync or write similar logic by yourself, to do the upload only for files that are not matching in local and remote location.

Categories

Resources