What would be a good design to achieve this synchronization across machines..? - java

I have 2 machines both running one process each. shell process on Machine A will scp something to machine B, and java process on B will use these files. Both processes run as crontab tasks.
How to achieve synchronization/atomicity etc? How to signal that whole of file has been written.
so that process on B always has access to latest and complete files, the handler doesn't go stale..

Assuming you're using a filesystem with atomic moves you can do that. Or use symbolic links.
A copies the file to a temp location on B. When the upload is complete, it relocates, with a move or symlink, the file into the expected location. B can then only ever see completely uploaded files.
If your process on A cannot SSH into B to make the final move, it could rather add another, zero byte marker file which indicates the upload is complete.
A uploads FOO.txt, when that upload is complete, it creates the FOO.txt.done file. B then scans the directory for *.done files, and uses the associated data file. Plus cleanup of course.

A quick'n'dirty work around / solution for the problem could be resolved by using user control, setting the ownership (and chmod'ing appropriately) when copying the file (using SCP from A), when finishing then changing the ownership / permissions so that the process on B will be able to access the file.

A few hacks that I can suggest I guess. You can either :-
create a temp file (like a lock file) at the start of file writing, the presence of which means that the file being streamed is being written to and delete the temp file once the writing is complete.
OR
Remove crontab on the consumer process and have the scp process send a signal (JMS/ unicast). The java process could just be a listener on a queue/ socket or be invoked upon receiving a unicast.

Related

identify the least used icons/files/folders

if we want to system level one daemon program running continuously to listen the creation/modification/access of files & save this information in log.Do we have any concept to implement this?
The required creation/modified/ access of file are set of file attributes stored with the file itself. So you can simply run a program which iterate through all the files in a give directory and read the required attributes. This program can be scheduled as a CRON job or Windows scheduled task so that it runs for every 5mins, 10mins, etc...
http://www.codeproject.com/Questions/172608/Getting-file-details-using-java is how you can read those attributes.

Check when all file handles are released in Java

I want to monitor a directory, and when a file appears there open it, process it and then move it to another directory. The problem is how to check that the other program is done writing it. With Java 7, I can use a WatchService from FileSystem, but I can only check when the files are created. What I want is to know when all file handles are released.
My first thought was that I could obtain an exclusive lock, but it turned out that it was possible to kick out another application while it was actually updating the file.
What is the preferred way to do this in Java? Thanks!
The Watcher APIs currently allow you to see events when a file system object is created, modified or deleted in a watched directory. They don't tell you about other inotify events (on Linux). In fact, I don't think there is a way to do this in pure Java.
I was looking for a way to do this myself a few weeks ago and I came across a mail thread that suggested that you could write a custom implementation of the FileSystem api that provided a file watcher that supported other file system events. I decided not to pursue it because I had an alternative solution ... based on knowledge of how the files I am watching are being produced.
In my case, the files are produced by instruments that save image files to a shared drive. The solution is to watch the stream of "modified" events for a newly created file. When it stops and no more have been forthcoming for a couple of seconds (the "settling time"), then the file can be processed.
If this solution proves to be unreliable, the fallback is to implement the watching and initial processing (taking a snapshot of the file) in C / C++ using the inotify calls directly. This will allow me to directly observe the file close event.
The simplest way for a filebased interface is:
The sender writes the files with a changed filename (e.g. "example.xml_")
When the sender has finished writing the file, he renames it (e.g. "example.xml_" to "example.xml")
The receiver scans only for "*.xml"

Knowing file is complete or not, before getting the Java file Object

I am polling file system for new file, which is upload by someone from web interface.
Now I have to process every new file, but before that I want to insure that the file I am processing is complete (I mean to say it is completely transferred through web interface).
How do I verify if file is complete downloaded or not before processing?
Renaming a filename is an atomic action in most (if not all) filesystems. You can make use of this by uploading the file to a recognizable temporary name and renaming it as soon as the upload is complete.
This way you will "see" only those files that have been uploaded completely and are safe for processing.
rsp's answer is very good. If, by any chance, it does not work for you, and if your polling code is running within a process different from the process of the web server which is saving the file, you might want to try the following:
Usually, when a file is being saved, the sharing options are "allow anyone to read" and "allow no-one to write". (exclusive write.) Therefore, you can attempt to open the file also with exclusive write access: if this fails, then you know that the web server is still holding the file open, and writing to it. If it succeeds, then you know that the web server is done. Of course be sure to try it, because I cannot guarantee that this is precisely how the web server chooses to lock the file.

Java monitor file system when java is not running

I recently implemented Java 7's WatchService and it works perfectly. Now I wondered if there is a way to get all the Events which occured since the last run of my program. For example:
I run my program, create some files, edit some files and I get all the corresponding Events
I close my program
I create a file named foo.txt
I start my program, and the first event i get is an ENTRY_CREATE for foo.txt
I thought about saving the lastModifiedDate and searching for files and directorys newer than the last execution of my program. Is there another (and better) way to do this?
There is no better way to do this if your program is meant to scan for all file changes (apart from storing files in a content / source control repository, but that would be external to your program).
Java 7's WatchService is only a more performant way than continuously looping and comparing file dates / folder contents, hence you need to implement your own logic to solve your own problem.
There is no way to do this in Java, or in any other programming language.
The operating system doesn't (and can't) buffer file system events on the off-chance that someone might start a program to process them. The event monitor / delivery system captures the events for a running application that is listening for them. When nothing is listening, the events are not captured.
You could write a small daemon (system service on Windows) which runs continuously and listens for file system changes. It could write these to a file. When your application runs, rather than listening for changes itself, it could just read the file. As events happen while it runs, the daemon will continue to receive them and send them through the file to the application.
You would need to ensure that the file was organised in such a way that it could be written to and read from safely at the same time, and that it did not grow indefinitely.

PHP synchronization

I'm unsure of the best solution for this but this is what I've done.
I'm using PHP to look into a directory that contains zip files.
These zip files contain text files that are to be loaded into an oracle database through SqlLoader (sqlldr).
I want to be able to start more than one PHP process via the command line to load these zip files into the db.
If other 'php loader' processes are running, they shouldn't overlap and try to load the same zip file. I know I could start one process and let it process each zip file but I'd rather start up a new process for incoming zip files so I can load concurrently.
Right now, I've created a class that will 'lock' a zip file, a directory, or a generic text file by creating a file called 'filename.ext.lock'. Other process that start up will check to see if a file has been 'locked' in this way, if it has it will skip that file and move on to another file for processing.
I've made a class that uses a directory and creates 'process id' files so that each PHP process has an id it can use for logging purposes and for identifying which PHP process has locked the file.
I'm on a windows machine and it isn't in the plan to make this an ubuntu machine, for those of you that might suggest pcntl.
What other solutions do you see? I know that this isn't truly synchronized because a lock file might be about to be created and then a context switch occurs and then another PHP process 'locks' the file before the first one can create the lock file.
Can you please provide me with some ideas about how I can make this solution better? A java implementation? Erlang?
Also forgot to mention, the PHP process connects to the DB to fetch metadata about the files that it is going to load via SqlLoader. I don't think that is important but just in case.
Quick note : I'm aware that sqlldr locks the table it is loading and that if multiple processes try to load to the same table it will become a bottle neck. To alleviate this problem I plan on making a directory that will contain files name after tables that are currently being loaded. After a table has completed loading the respective file will be deleted and other processes will check that it is safe to load that table.
Extra information : I'm using 7zip to unzip the files and php's exec to perform these commands.
I'm using exec to call sqlldr as well.
The zip files can be huge (1gb) and loading one table can take up to an 1hr.
Rather than creating a .lock file, you can just rename the zip file when a loader start to process a zip file. e.g. "foobar.zip.bar", the process should be faster than creating a new file on disk.
But it doesn't ensure your next loader will be loaded after the file rename. You should at least have some
controls loading new loaders in another script.
Also, just some side suggestion, its possible to emulate threading in PHP using CURL, you might want to try it out.
https://web.archive.org/web/20091014034235/http://www.ibuildings.co.uk/blog/archives/811-Multithreading-in-PHP-with-CURL.html
I do not know if I understand right, but I have a suggestion: get the lock files with a prefix of priority.
Example:
10-script.php started
20-script.php started (enters a loop waiting for a 10-foobar.ext.lock)
while 10-foobar.ext.lock is not generated by 10-script.php, still waiting
30-script.php will have to wait for 10-foobar.ext.lock and 20-example.ext.lock
I tried to find pcntl_fork with cygwin, but found nothing that works

Categories

Resources