I'm unsure of the best solution for this but this is what I've done.
I'm using PHP to look into a directory that contains zip files.
These zip files contain text files that are to be loaded into an oracle database through SqlLoader (sqlldr).
I want to be able to start more than one PHP process via the command line to load these zip files into the db.
If other 'php loader' processes are running, they shouldn't overlap and try to load the same zip file. I know I could start one process and let it process each zip file but I'd rather start up a new process for incoming zip files so I can load concurrently.
Right now, I've created a class that will 'lock' a zip file, a directory, or a generic text file by creating a file called 'filename.ext.lock'. Other process that start up will check to see if a file has been 'locked' in this way, if it has it will skip that file and move on to another file for processing.
I've made a class that uses a directory and creates 'process id' files so that each PHP process has an id it can use for logging purposes and for identifying which PHP process has locked the file.
I'm on a windows machine and it isn't in the plan to make this an ubuntu machine, for those of you that might suggest pcntl.
What other solutions do you see? I know that this isn't truly synchronized because a lock file might be about to be created and then a context switch occurs and then another PHP process 'locks' the file before the first one can create the lock file.
Can you please provide me with some ideas about how I can make this solution better? A java implementation? Erlang?
Also forgot to mention, the PHP process connects to the DB to fetch metadata about the files that it is going to load via SqlLoader. I don't think that is important but just in case.
Quick note : I'm aware that sqlldr locks the table it is loading and that if multiple processes try to load to the same table it will become a bottle neck. To alleviate this problem I plan on making a directory that will contain files name after tables that are currently being loaded. After a table has completed loading the respective file will be deleted and other processes will check that it is safe to load that table.
Extra information : I'm using 7zip to unzip the files and php's exec to perform these commands.
I'm using exec to call sqlldr as well.
The zip files can be huge (1gb) and loading one table can take up to an 1hr.
Rather than creating a .lock file, you can just rename the zip file when a loader start to process a zip file. e.g. "foobar.zip.bar", the process should be faster than creating a new file on disk.
But it doesn't ensure your next loader will be loaded after the file rename. You should at least have some
controls loading new loaders in another script.
Also, just some side suggestion, its possible to emulate threading in PHP using CURL, you might want to try it out.
https://web.archive.org/web/20091014034235/http://www.ibuildings.co.uk/blog/archives/811-Multithreading-in-PHP-with-CURL.html
I do not know if I understand right, but I have a suggestion: get the lock files with a prefix of priority.
Example:
10-script.php started
20-script.php started (enters a loop waiting for a 10-foobar.ext.lock)
while 10-foobar.ext.lock is not generated by 10-script.php, still waiting
30-script.php will have to wait for 10-foobar.ext.lock and 20-example.ext.lock
I tried to find pcntl_fork with cygwin, but found nothing that works
Related
I have an app that accesses words from a csv text files. Since they usually do not change I have them placed inside a .jar file and read them using .getResourceAsStream call. I really like this approach since I do not have to place a bunch of files onto a user's computer - I just have one .jar file.
The problem is that I wanted to allow "admin" to add or delete the words within the application and then send the new version of the app to other users. This would happen very rarely (99.9% only read operations and 0.1% write). However, I found out that it is not possible to write to text files inside the .jar file. Is there any solution that would be appropriate for what I want and if so please explain it in detail as I'm still new to Java.
It is not possible because You can't change any content of a jar which is currently used by a JVM.
Better Choose alternate solution like keeping your jar file and text file within the same folder
Briefing: I'm developing a system which must grant access to all the files of a project, it has to be an interface to open, upload and modify those files, i decided to store all of them (the files) in an archive (zip), in order to enhance the response time i decided not to unzip and then rezip all its contents, besides i decided to modify the zip as is, using the Zip FileSystem Provider of java, but I'm facing many troubles because of the lack of info; when the user need an specific file i only decompress that file so the user can work on it, i monitor the changes in the file, then if i detect any change, i upload the file (replace) into the zip.
the problem is:
Since I'm using threads while saving the files into the archive (to prevent the GUI from freezing) the user can open others files and modify them even when another file is being saved, i want to have the changes as updated as possible into the archive to prevent the lost of information in case of a blackout but there is not a method like FileSystem.commit() or FileSystem.flush(), so the changes occurs into the archive just when I close the file system, but opening another file system takes too long, adding vulnerability to the lost of infomation to my system during the time that the another filesystem is initialized... any idea of how to commit changes or be always capable of have a way to save?
Opening two filesystems (once as a backup to do the operations while the other is being instantiated) does not work either, because they change the name of the file for a brief time but during that time the other instance may try to be created and fail because it cannot find the name for the file...
Greetings...
I am currently working on a program to make sitting charts for my teacher's classroom. I have put all of the data files in the jar. These are read in and put in to a table. After running the main function of the program, it updates the files to match what the tables values are. I know I need to explode the jar and then rejar it during excution in order to edit the files, but I can't find any explination on how to rejar during excution. Does anyone have any ideas?
Short answer:
Put data files outside of the binary and ship together with JAR in a separate folder.
Long one:
It seems like you are approaching the problem from the wrong direction. JAR file is something like an executable (.exe) on Windows platform - a read only binary containing code.
You can (although it is a bad practice) put some resources like data files, multimedia, etc. inside JAR (like you can inside .exe). But a better solution would be to place these resources outside of the binary so you can switch them without recompiling/rebuilding.
If you need to modify the resources on-the-fly while the application is running, you basically have no choice. The data files have to be outside the binary. Once again, you'll never see a Windows .exe file modifying itself while running.
Tomasz is right that the following is bad practice, but it is possible.
The contents of the classpath are read into memory during bootstrapping, however the files are modifiable but their changes will not be reflected after initialisation. I would recommend putting the data into another file, separate to your class files, but if you insist on keeping them together, you could look at:
JarInputStream or ZipInputStream to read the contents of the JAR file
Get the JarEntry for the appropriate file
Read and modify the contents as you desire
JarOutputStream or ZipOutputStream to write the contents back out
Make sure you're not reading the resource through the classpath and that it's coming from a file on disk / network.
I am polling file system for new file, which is upload by someone from web interface.
Now I have to process every new file, but before that I want to insure that the file I am processing is complete (I mean to say it is completely transferred through web interface).
How do I verify if file is complete downloaded or not before processing?
Renaming a filename is an atomic action in most (if not all) filesystems. You can make use of this by uploading the file to a recognizable temporary name and renaming it as soon as the upload is complete.
This way you will "see" only those files that have been uploaded completely and are safe for processing.
rsp's answer is very good. If, by any chance, it does not work for you, and if your polling code is running within a process different from the process of the web server which is saving the file, you might want to try the following:
Usually, when a file is being saved, the sharing options are "allow anyone to read" and "allow no-one to write". (exclusive write.) Therefore, you can attempt to open the file also with exclusive write access: if this fails, then you know that the web server is still holding the file open, and writing to it. If it succeeds, then you know that the web server is done. Of course be sure to try it, because I cannot guarantee that this is precisely how the web server chooses to lock the file.
I have 2 machines both running one process each. shell process on Machine A will scp something to machine B, and java process on B will use these files. Both processes run as crontab tasks.
How to achieve synchronization/atomicity etc? How to signal that whole of file has been written.
so that process on B always has access to latest and complete files, the handler doesn't go stale..
Assuming you're using a filesystem with atomic moves you can do that. Or use symbolic links.
A copies the file to a temp location on B. When the upload is complete, it relocates, with a move or symlink, the file into the expected location. B can then only ever see completely uploaded files.
If your process on A cannot SSH into B to make the final move, it could rather add another, zero byte marker file which indicates the upload is complete.
A uploads FOO.txt, when that upload is complete, it creates the FOO.txt.done file. B then scans the directory for *.done files, and uses the associated data file. Plus cleanup of course.
A quick'n'dirty work around / solution for the problem could be resolved by using user control, setting the ownership (and chmod'ing appropriately) when copying the file (using SCP from A), when finishing then changing the ownership / permissions so that the process on B will be able to access the file.
A few hacks that I can suggest I guess. You can either :-
create a temp file (like a lock file) at the start of file writing, the presence of which means that the file being streamed is being written to and delete the temp file once the writing is complete.
OR
Remove crontab on the consumer process and have the scp process send a signal (JMS/ unicast). The java process could just be a listener on a queue/ socket or be invoked upon receiving a unicast.