Java: Assert that file serialization is complete? - java

I am dealing with an object in Java that is very expensive to compute and several megabytes in size. In order to preserve it across application restarts, I want to serialize it into a File, and re-load that file on startup (if present).
The problem is that most file systems are not transactional. The file writing process can be interrupted by exceptions, JVM termination and/or power failure. What I absolutely need to assert is that if the file is used, the information within is complete. I can throw away the information and recalculate if needed, but reading and relying on incomplete data must be avoided.
My attempt would be to serialize and write a "seal" object at the end of the file, like a checksum for example. The presence of this object during deserialization guarantees that the serialization process was complete. If the seal object is absent during deserialization, I know that I cannot trust the data in the file as it might be incomplete. I am looking for an OS-independent solution, and I do not need to consider "attacks" that maliciously modify the contents of the serialized file.
My question is: Is the seal object approach outlined above safe, or are there still some corner cases where I can end up reading an incomplete file without noticing it?

Just write the file under a different, temporary name. Once the file is complete, delete any previous version of the file and rename the new file to the real name.
If program dies during write, you're just left with an incomplete temp file. The real file is still as before (or missing), so you'll never see an incomplete file to load.

Related

How to safely create a file if it doesn't exist from concurrently-running processes without using locking?

Suppose two (or more) concurrently-running Java processes need to check for the existence of a file, create it if it doesn't exist, and then potentially read from that file over the course of their runs. We want to protect ourselves against the possibility of multiple writer processes clobbering each other and/or reader processes reading an incomplete or inconsistent version of the file.
What we're currently doing to arbitrate this situation is to use Java NIO FileLocks. One process manages to acquire an exclusive lock on the file to be created using FileChannel.tryLock() and creates it, while the other concurrently-running processes fail to acquire a lock and fall back to using an in-memory version of the file for their runs.
Locking is causing various problems for us on our compute farm, however, so we're exploring alternatives. So my question to you is: is there a way to do this safely without using file locks?
Could, eg., the processes write to independent temporary files when they find a file doesn't exist, and then more or less "atomically" move the temp file(s) into place after they've been written? In this scenario, we might end up with multiple writer processes, but that would be ok provided that any processes reading from the file always read one version or another, and not a mix of two or more versions. However, I don't think all operating systems guarantee that if you have a file open for reading, you'll continue reading from the original version of the file even if it's overwritten mid-way through the read.
Any suggestions would be much appreciated!
Suppose two (or more) concurrently-running Java processes need to check for the existence of a file, create it if it doesn't exist, and then potentially read from that file over the course of their runs.
I don't quite understand the create and read part of the question. If you are looking to make sure that you have a unique file then you could use new File(...).createNewFile() and check to make sure that it returns true. To quote from the Javadocs:
Atomically creates a new, empty file named by this abstract pathname if
and only if a file with this name does not yet exist. The check for the
existence of the file and the creation of the file if it does not exist
are a single operation that is atomic with respect to all other
filesystem activities that might affect the file.
This would give you a unique file that only that process (or thread) would then "own". I'm not sure how you were planning on letting the writer know which file to write to however.
If you are talking about creating a unique file that you write do and then moved into a write directory to be consumed then the above should work. You would need to create a unique name in the write directory once you were done as well.
You could use something like the following:
private File getUniqueFile(File dir, String prefix) {
long suffix = System.currentTimeMillis();
while (true) {
File file = new File(dir, prefix + suffix);
// try creating this file, if true then it is unique
if (file.createNewFile()) {
return file;
}
// someone already has that suffix so ++ and try again
suffix++;
}
}
As an alternative, you could also create a unique filename using UUID.randomUUID() or something to generate a unique name.

Java Serialization - Recovering serialized file after process crash

I have a following usecase.
A process serializes certain objects to a file using BufferedOutputStream.
After writing each object, process invokes flush()
The use case is that if the process crashes while writing an object, I want to recover the file upto the previous object that has been written successfully.
How can I deserialize such file? How will Java behave while deserializing such file.
Will it successfully deserialize upto the object that were written successfully before crash?
While reading the last partially written object, what will be the behavior. How can I detect that?
Update1 -
I have tried to simulate process crash via manually killing the process while objects are being written. I have tried around 10-15 times.Each time i am able to deserialize the file and file does not has any partial object.
I am not sure if my test is exhaustive enough and therefore need further advice.
Update2 - Adam had pointed a way which could simulate such test using truncating the file randomly.
Following is the behavior observed for trying out around 100 iterations -
From the truncated file ( which should be equivalent to the condition of file when a process crashes), Java can read upto last complete object successfully.
Upon reaching the last partially written object, Java does not throw any StreamCorruptedException or IOException. It simply throws EOFException indicated EOF and ignores the partial object.
Each object is deserialized or not before reading the next one. It won't be impacted because a later object failed to be written or will fail to deserialize
I suspect you are misusing java serialization - it's not intended to be a reliable and recoverable means of permanent storage. Use a database for that. If you must, you can
use a database to store the serialized form of java objects, but that would be pretty inefficient.
Yeah, testing such scenario manually (by killing the process) may be difficult. I would suggest writing a test case, where you :
Serialize a set of objects and write them to a file .
Open the file and basically truncate it at random position.
Try to load and deserialize (and see what happens)
Repeat 1. to 3. with several other truncate positions.
This way you are sure that you are loading a broken file and that your code handles it properly.
Have you tried appending to ObjectOutputStream? You can find the solution HERE just find the post where explains how to create an ObjectOutputStream with append.

How to open file in shared mode in Java

How to open file in shared mode in Java to allow other users to read and modify the file?
Thanks
In case you're asking about the Windows platform, where files are locked at filesystem level, here's how to do it with Java NIO:
Files.newInputStream(path, StandardOpenOption.READ)
And a demonstration that it actually works:
File file = new File("<some existing file>");
try (InputStream in = Files.newInputStream(file.toPath(), StandardOpenOption.READ)) {
System.out.println(file.renameTo(new File("<some other name>"));
}
Will print true, because a file open in shared-read mode may be moved.
For more details refer to java.nio.file.StandardOpenOption.
I'm not entirely sure I know what you mean, but if you mean concurrent modification of the file, that is not a simple process. Actually, it's pretty involved and there's no simple way to do that, off the top of my head you'd have to:
Decide whether the file gets refreshed for every user when someone else modifies it, losing all changes or what to do in that case;
Handle diffing & merging, if necessary;
Handle synchronization for concurrent writing to the same file, so that when two users want to write that file, the content doesn't end up gibberishly, e.g., if one user wants to write "foo" and another one wants to write "bar", the content might end up being "fbaroo" without synchronization.
If you just want to open a file in read-only mode, all you gotta do is open it via FileInputStream or something similar, an object that only permits reading operations.

How to write lines to a file, and each line is a atomic operation?

I want to write some lines to a file, and I need each line of writing is a atomic operation.
For example, I have 3 lines:
111111111111111111111111
222222222222222222222222
333333333333333333333333
When I write them into a file line by line, the program may be exit by error, so the saved data may be:
11111111111111111111111
222222
This is not what I expected. I hope each line is a transaction, a atomic operation.
How should I do this?
Currently I use Java to do this.
There isn't a 100% reliable way to guarantee this.
I think the closest you can get is by calling flush() on the output stream and then sync() on the underlying file descriptor. Again, there are failure modes where this won't help.
If you really need atomic writing of new lines to a file, I guess the only way is to create a copy under a new name, write the new line and rename the new file to the original name. The rename operation is atomic, at least under POSIX. On Windows you would need to remove the original file before renaming, which bears the problem of not being able to restore the file if a problem occurs in the that process.
You can use flush/sync as #aix suggests. Otherwise (and better -- 99.999% reliable) is to use some sort of environment (such as a database) that includes transaction support and use commit.

How to know whether a file was processed before

How can I be sure if a file was processed before? There is a remote storage location which is a file source for my application. My program gets files from this location and processes them in a scheduled way. How can I be sure that the next time I fetch only non-processed files? I'm thinking about using file attributes. The archive and modified date can be a solution. But I learned that two bits of file attributes are not used. How can I use these fields in Java? By the way I don't want to use a database.
A common strategy is to use some form of hash function to create a checksum. Record the checksum of the file, and compare the list of processed files identified by checksum against the file in question. If the checksum of the file in question is in the list, you have already processed it.
Protect your list of processed file checksums. If you lose it, or it becomes corrupted, it might be a long, bad day.
To prevent unnecessary network traffic, you might consider preparing 'check' files on the remote repository that contain a checksum that corresponds to a potential input file.
EDIT:
Upon further comment, it is potentially possible to directly interact with file system attributes. The proposed Java 1.7 spec introduces file-system specific attribute views to directly interact with these attributes. The view you would be interested in is 'DosFileAttributeView'.
Basic use might be something similar to this ('input' is a file based on a java 'Path'; add necessary exception handling):
// import as necessary from java.nio.file and java.io
DosFileAttributeView view = input.getFileAttributeView(DosFileAttributeView.class);
//Check if the system supports this view
if (view != null)
{
DosFileAttributes attributes = view.readAttributes();
// skip any file already marked as an archive
if (!attributes.isArchive())
{
myObject.process(input)
attributes.setArchive(true)
}
}
Can you rename the file (e.g. "filename.archive")? or into an "archive" subdirectory?

Categories

Resources