I have a function, that's purpose is to create a directory and copy a csv file to that directory. This same function gets ran multiple times, each time by an object in a different thread. It gets called in the object's constructor, but I have logic in there to only copy the file if it does not already exist (meaning, it checks to make sure that one of the other instances in parallel did not already create it).
Now, I know that I could simply rearrange the code so that this directory is created and the file is copied before the objects are ran in parallel, but that is not ideal for my use case.
I am wondering, will the following code ever fail? That is, due to one of the instances being in the middle of copying a file, while another instance attempts to start copying that same file to the same location?
private void prepareGroupDirectory() {
new File(outputGroupFolderPath).mkdirs();
String map = "/path/map.csv"
File source = new File(map);
String myFile = "/path/test_map.csv";
File dest = new File(myFile);
// copy file
if (!dest.exists()) {
try{
Files.copy(source, dest);
}catch(Exception e){
// do nothing
}
}
}
To sum it all up. Is this function thread-safe in the sense that, different threads could all run this function in parallel without it breaking? I think yes, but any thoughts would be helpful!
To be clear, I have tested this many many times and it has worked every time. I am asking this question to make sure, that in theory, it will still never fail.
EDIT: Also, this is highly simplified so that I could ask the question in an easy to understand format.
This is what I have now after following comments (I still need to use nio instead), but this is currently working:
private void prepareGroupDirectory() {
new File(outputGroupFolderPath).mkdirs();
logger.info("created group directory");
String map = instance.getUploadedMapPath().toString();
File source = new File(map);
String myFile = FilenameUtils.getBaseName(map) + "." + FilenameUtils.getExtension(map);
File dest = new File(outputGroupFolderPath + File.separator + "results_" + myFile);
instance.setWritableMapForGroup(dest.getAbsolutePath());
logger.info("instance details at time of preparing group folder: {} ", instance);
final ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
// copy file
if (!dest.exists()) {
String pathToWritableMap = createCopyOfMap(source, dest);
logger.info(pathToWritableMap);
}
} catch (Exception e) {
// do nothing
// thread-safe
} finally {
lock.unlock();
}
}
It isn't.
What you're looking for is the concept of rotate-into-place. The problem with file operations is that almost none of it is atomic.
Presumably you don't just want 'only one' thread to win the race for making this file, you also want that file to either be perfect, or not exist at all: You would not want anybody to be able to observe that CSV file in a half-baked state, and you most certainly wouldn't want a crash halfway through generating the CSV file to mean that the file is there, half-baked, but its mere existence means it prevents any attempt to write it out properly. You can't use finally blocks or exception catching to address this issue; someone might trip over a powercable.
So, how do you solve all these problems?
You do not write to foo.csv. Instead you write to foo.csv.23498124908.tmp where that number is randomly generated. Because that just isn't the actual CSV file anybody is looking for, you can take all the time in the world to finish it properly. Once it is done, then you do the magic trick:
You rename foo.csv.23498124908.tmp into foo.csv, and do so atomically - one instant in time foo.csv does not exist, the next instant in time it does and it has the complete contents. Also, that rename will only succeed if the file didn't exist before: It is impossible for two separate threads to both rename their foo.csv.23481498.tmp file into foo.csv simultaneously. If you were to try it and get the timing just perfect, one of them (arbitrary which one) 'wins', the other one gets an IOException and doesn't rename anything.
The way to do this is using Files.move(from, to, StandardCopyOptions.ATOMIC_MOVE). ATOMIC_MOVE is even kind enough to flat out refuse to execute if somehow the OS/filesystem combination simply does not support ATOMIC_MOVE (they pretty much all do, though).
The second advantage is that this locking mechanism works even if you have multiple entirely different apps running. If they all use ATOMIC_MOVE or the equivalent of this in that language's API, only one can win, whether we're talking 'threads in a JVM' or 'apps on a system'.
If you want to instead avoid the notion that multiple threads are both simultaneously doing the work to make this CSV file even though only one should do so and the rest should 'wait' until the first thread is done, file system locks are not the answer - you can try (make an empty file whose existence is a sign that some other thread is working on it) - and there's even a primitive for that in java's java.nio.file APIs. The CREATE_NEW flag can be used when creating a file, which means: Atomically create it, failing if the file already exists with concurrency guarantees (if multiple processes/threads all run that simultaneously, one succeeds and all others fail, guaranteed). However, CREATE_NEW can only atomically create. It cannot atomically write, nothing can (hence the whole 'rename it into place' trick above).
The problem with such locks are two fold:
If the JVM crashes that file doesn't go away. Ever launched a linux daemon process, such as postgresd, and it told you that 'the pid file is still there, if there is no postgres running please delete it'? Yeah, that problem.
There's no way to know when it is done, other than to just re-check for that file's existence every few milliseconds. If you wait very few milliseconds you're trashing the disk potentially (hopefully your OS and disk cache algorithms do a decent job). If you wait a lot you might be waiting around for no reason for a long time.
Hence why you shouldn't do this stuff, and just use locks within the process. Use synchronized or make a new java.util.concurrent.ReentrantLock or whatnot.
To answer your code snippet specifically, no that is broken: It is possible for 2 threads to run simultaneously and both get false when it runs dest.exists(), thus both entering the copy block, and then they fall all over each other when copying - depending on file system, usually one thread ends up 'winning', with their copy operation succeeding and the other thread's seemingly lost to the aether (most file systems are ref/node based, meaning, the file was written to disk but its 'pointer' was immediately overwritten, and the filesystem considers it garbage, more or less).
Presumably you consider that a failing scenario, and your code does not guarantee that it can't happen.
NB: What API are you using? Files.copy(instanceOfJavaIoFile, anotherInstanceOfJavaIoFile) isn't java. There is java.nio.file.Files.copy(instanceOfjnfPath, anotherInstanceOfjnfPath) - that's the one you want. Perhaps this Files you have is from apache commons? I strongly suggest you don't use that stuff; those APIs are usually obsolete (java itself has better APIs to do the same thing), and badly designed. Ditch java.io.File, it's outdated API. Use java.nio.file instead. The old API doesn't have ATOMIC_MOVE or CREATE_NEW, and doesn't throw exceptions when things go wrong - it just returns false which is easily ignored and has no room to explain what went wrong. Hence why you should not use it. One of the major issues with the apache libraries is that it uses the anti-pattern of piling a ton of static utility methods into a giant container. Unfortunately, the second take on file stuff in java itself (java.nio.file) is similarly boneheaded API design. I guess in the java world, third time will be the charm. At any rate, a bad core java API with advanced capabilities is still a better than a bad apache utility API that wraps around the older API which simply does not expose the kinds of capabilities you need here.
Related
From what I have searched so far, I find 2 solutions.
One is to create or reuse a file, then try to lock the file.
File file = new File(lockFile);
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "rw");
FileLock fileLock = randomAccessFile.getChannel().tryLock();
if (fileLock == null) {
// means someone have obtain the lock, therefore it is expected that an application is there
System.exit(0);
}
Advantage of this approach:
If the application is shutdown abnormally, OS will help us release the lock. We do not need to manually delete the file, in order to work.
Source: How to implement a single instance Java application?
Create a file, without any lock, just use the presence of the file to determine if an application is running or not.
Disadvantage of this approach:
If application shutdown abnormally, we need to manually remove the file, in order to work.
Though I personally think this is worse option compared to 1, however it seems library use this approach more often. For example, Play Framework (RUNNING_PID file).
So can someone suggest why framework seem to suggest use of 2 over 1? What are the advantages of such approach?
In addition, is the selection choice depends on performance and ease of use. For example, client side application should choose 1 and server side application should choose 2?
I have the following scenario. There are 2 applications that share a database. Both these applications can be used to alter the underlying database. For e.g., Customer 1 can be modified from both the systems. I want to make sure when someone performs an action on say customer 1 in application 1 then I need a persistent lock for that lock so that nobody from application 2 can perform any action on the same customer. Even if any of these applications go down, it should still hold the lock. What will be the right approach for solving such an issue?
As #Turing85's comment hints at, this is extremely dangerous territory: If someone trips over a power cable, your app is out of the running and cannot be started again. permanently. At least, until someone goes in and manually addresses the problem. This is rarely what you want.
The normal solution is to do the locking at the DB level: If it is a 'single file is the database' model, such as H2 or SQLite, then let the DB engine lock the file for writing, and treat the OS-level file lock serve as your gating mechanism. This has the considerable advantage that if app A falls out of the air for any reason (power shortage, hard crash, who knows), the lock is relinquished.
If the DB is a separate running process (psql, mysql, mssql, etc), those have locking features you can use.
If none of those options are available, you can handroll it: You can make files with the new file API that are guaranteed atomic/unique:
int pid = 0; // see below
Path p = Paths.get("/absolute/path/to/agreed/upon/lockfile/location/lockfile.pid");
Files.write(p, String.valueOf(pid), StandardOpenOption.CREATE_NEW);
The CREATE_NEW open option is asking java to ensure atomicity: Either [A] the file did not exist before, exists now, and it is this process that made it, or [B] this will throw.
There is no [C] this process created it, but another process unluckily was doing the same thing at the same time and also created it and one of these processes is now overwriting the efforts of the other - that is what CREATE_NEW means and guarantees: That this won't happen. (vs the CREATE option which will overwrite what's there and makes no atomicity guarantees).
You can now use the file system as your global unique lock: To acquire the lock, you make that file. If you can, great. You got it. If you can't, then you have to wait (you'll need to use the watcher API or a loop if you care about acquiring it as soon as you can, not a great option, that is a very expensive operation compared to in-process locking!) - to relinquish the lock, simply delete the file.
To guard against a hardcrash leaving the file there, stuck, permanently, preventing your app from ever running again, it can help to register the 'pid' (process id) inside it. This gives you some debugging if you are manually fixing matters, and you can use to automatically check ('hey, OS, is there even still a process running with id 123981? No? Okay, then it must have hard-crashed and left the lock file in place!'). Unfortunately, working with pids in java is convoluted, as java is more or less designed around the notion that you shouldn't rely too much on the underlying OS, and java does not really assume that 'process IDs' are a thing the underlying OS does. google around for how to obtain it, you CAN do this.
Which gets us to the final point, your evident fear of inconsistency: After all, you actually appear to the desire the clearly insane notion that you want the app to be permanently disabled when there's a hard crash (a process crashes and the lock is not explicitly relinquished). I assume you want this because you are afraid that the database is left in an inconsistent state and you don't want anything to ever touch it again until you manually look at it.
Oookay, well, the lock file business is precisely how you get that. However, this is rather user hostile, and not needed: You can design databases and process flows (using transactions, append-only tables, and journal systems) so that they will always cleanly survive hard crashes.
For example, consider file systems. In ye old aged sepia toned past, when you stumbled over your power cord, then on bootup you'd get a nasty thing where the system would do a 'full disk check', and it may well find a bunch of errors.
But on modern systems this is no longer the case. Trip over that power card all day long. You won't get corrupted files (unless processes are badly designed, in which case the corruption is the fault of the app, not the file system), and no extensive disk checks are needed.
This works primarily by a concept known as 'journalling'.
Say, you want to replace a file that reads "Hello, World!" with the text "Goodbye now!". You could just start writing bytes. Let's say you get to "Goodb, World!" and then someone trips over a cable.
You're now hosed. The data is inconsistent and who knows what was happening.
But imagine a different system:
Journalling
The system first makes a file called '.jobrecord', writes in it: I'm going to open this file, and overwrite the data at the start with 'Goodbye, now!'.
Then, it actually goes ahead and does that.
Then, it deletes the job record in an atomic way (by updating a single byte for example, to mark: "done").
Now, on bootup, the system can check if that file is there, and if it is, check that the job was actually done, or finish it if need be. Voila, now you can never have an inconsistent system.
You can write such tools too.
Alternative: append-only
Another way to roll is that data is only ever added, and has a validity marker. So, you never overwrite any files, you only make new ones, and you 'rotate them into place'. For example, instead of writing over the file, you make a new file called 'target.new', copy over the data, then overwrite the start with "Goodbye, now!", and then atomically rename the file over the original 'target', thus guaranteeing that the original is never harmed, and in one moment in time, the 'view' of the target file is the old one, and in another atomic followup moment, the new one, with never a time in between that is halfway between two points.
A similar concept in databases is to never UPDATE, only INSERT, having an atomically increasing counter, and knowing that 'the current state' is always the row with the highest counter number.
The point is: There are ways to build robust systems that do not ever result in data being inconsistent unless an external force corrupts your data stores.
I have a multi-threaded Java 7 program (a jar file) which uses JDBC to perform work (it uses a fixed thread pool).
The program works fine and it logs things as it progresses to the command shell console window (System.out.printf()) from multiple concurrent threads.
In addition to the console output I also need to add the ability for this program to write to a single plain ASCII text log file - from multiple threads.
The volume of output is low, the file will be relatively small as its a log file, not a data file.
Can you please suggest a good and relatively simple design/approach to get this done using Java 7 features (I dont have Java 8 yet)?
Any code samples would also be appreciated.
thank you very much
EDIT:
I forgot to add: in Java 7 using Files.newOutputStream() static factory method is stated to be thread safe - according to official Java documentation. Is this the simplest option to write a single shared text log file from multiple threads?
If you want to log output, why not use a logging library, like e.g. log4j2? This will allow you to tailor your log to your specific needs, and can log without synchronizing your threads on stdout (you know that running System.out.print involves locking on System.out?)
Edit: For the latter, if the things you log are thread-safe, and you are OK with adding LMAX' disruptor.jar to your build, you can configure async loggers (just add "async") that will have a logging thread take care of the whole message formatting and writing (and keeping your log messages in order) while allowing your threads to run on without a hitch.
Given that you've said the volume of output is low, the simplest option would probably be to just write a thread-safe writer which uses synchronization to make sure that only one thread can actually write to the file at a time.
If you don't want threads to block each other, you could have a single thread dedicated to the writing, using a BlockingQueue - threads add write jobs (in whatever form they need to - probably just as strings) to the queue, and the single thread takes the values off the queue and writes them to the file.
Either way, it would be worth abstracting out the details behind a class dedicated for this purpose (ideally implementing an interface for testability and flexibility reasons). That way you can change the actual underlying implementation later on - for example, starting off with the synchronized approach and moving to the producer/consumer queue later if you need to.
Keep a common PrintStream reference where you'll write to (instead of System.out) and set it to System.out or channel it through to a FileOutputStream depending on what you want.
Your code won't change much (barely at all) and PrintStream is already synchronized too.
I read that new File("path") doesn't physically create a file on disk. Though in the API it is said:
Instances of this class may or may not denote an actual file-system
object such as a file or a directory. If it does denote such an object then that object resides in a partition. A partition is an operating system-specific portion of storage for a file system. A single storage device (e.g. a physical disk-drive, flash memory, CD-ROM) may contain multiple partitions.
So I'm curious if it is safe to have such code in multithreaded environment:
File file = new File( "myfile.zip" );
// do some operations with file (fill it with content)
file.saveSomewhere(); // just to denote that I save it after several operations
For example, thread1 comes here, creates an instance and starts doing operations. Meanwhile thread2 interrupts it, creates its instance with the same name (myfile.zip) and does some other operations. After that they consequently save the file.
I need to be sure that they don't work with the same file and the last thread saving the file overwrites the previous one.
No, File does not keep a lock and is not safe for the pattern you describe. You should either lock or keep the file in some wrapper class.
If you would provide a little bit more of your code, somebody can certainly help you find a suitable pattern.
Certainly the lines you commented will not thread-safe, you will have to protect them with a mutex or a monitor. The gold rule is: every time you have to write on something in a multithread context, it's necessary to protect that region to grant the thread-safeness (Bernstein conditions).
I'm not sure if the statement you propose requires to be protected too as i never used that command, but i thought this could be helpful to someone else too.
I've written a program to find password of a zip file using zip4j. I've used bruteforce method, therefore each password should be checked on file using below code:
while(true)
try
{
zipFile.setPassword(passstr);
zipFile.extractAll(deststr);
break;
}
catch (Exception ex2)
{
//passstr = next password string to be checked
}
but this is so slow because of io-related task repeated in the loop each time!
is there any other way to check the password of zip file? or do i need to move the file to memory somehow and make it faster? or any any other solution to speed it up?
thanks
Brute force is a Embarrassingly parallel problem. You should try to use one thread per core for solving this problem.
In addition you should (as you said) avoid I/O. For this find a way to load the file to memory and test the password without using I/O. (You should try to minimize the test duration also, according to wikipedia and this spec it appears that zip files may contain an encrypted header which will be faster to test the password in - if some test is possible).
Combining parallel and memory solutions, it is obvious that you should use one copy of the file in memory for each thread.
Finally try to use the fastest decryption implementation you can find. (do it in c mabybe?)
try the following logic.
start the program
the path of the folder should be shared with the java program.
first make copies of the zip file multiple depending on your need in
the same folder.
next start separate threads for each copy that you have created.
execute the password crack process with each thread containing
different set of brute force dictionary.
keep a common flag in the class which will be false initially and
each thread will follow that flag for execution
lastly, when any thread returns true with a solution, it sets the
flag and all other threads stop processing.
the number of threads can be set as per need or as per performance
As Vineet has already mentioned, applying threading will improve performance somewhat. However a more sophisticated approach would be to:
Determine what encryption algorithm is being used by the file
Read the file once into memory
Fire off several threads to decrypt the raw data with your generated password
depending on the library / decrypt mechanism you will either get an exception or a random jumble of bits as a result - in which case check the first few bytes for the file signature of a zip file.
Be prepared to wait for a few thousand years if the person who protected the file knows how to generate a secure password though!