How to open file in shared mode in Java

How to open file in shared mode in Java - java

How to open file in shared mode in Java to allow other users to read and modify the file?
Thanks

In case you're asking about the Windows platform, where files are locked at filesystem level, here's how to do it with Java NIO:
Files.newInputStream(path, StandardOpenOption.READ)
And a demonstration that it actually works:
File file = new File("<some existing file>");
try (InputStream in = Files.newInputStream(file.toPath(), StandardOpenOption.READ)) {
System.out.println(file.renameTo(new File("<some other name>"));
}
Will print true, because a file open in shared-read mode may be moved.
For more details refer to java.nio.file.StandardOpenOption.

I'm not entirely sure I know what you mean, but if you mean concurrent modification of the file, that is not a simple process. Actually, it's pretty involved and there's no simple way to do that, off the top of my head you'd have to:
Decide whether the file gets refreshed for every user when someone else modifies it, losing all changes or what to do in that case;
Handle diffing & merging, if necessary;
Handle synchronization for concurrent writing to the same file, so that when two users want to write that file, the content doesn't end up gibberishly, e.g., if one user wants to write "foo" and another one wants to write "bar", the content might end up being "fbaroo" without synchronization.
If you just want to open a file in read-only mode, all you gotta do is open it via FileInputStream or something similar, an object that only permits reading operations.

Related

Java: Assert that file serialization is complete?

I am dealing with an object in Java that is very expensive to compute and several megabytes in size. In order to preserve it across application restarts, I want to serialize it into a File, and re-load that file on startup (if present).
The problem is that most file systems are not transactional. The file writing process can be interrupted by exceptions, JVM termination and/or power failure. What I absolutely need to assert is that if the file is used, the information within is complete. I can throw away the information and recalculate if needed, but reading and relying on incomplete data must be avoided.
My attempt would be to serialize and write a "seal" object at the end of the file, like a checksum for example. The presence of this object during deserialization guarantees that the serialization process was complete. If the seal object is absent during deserialization, I know that I cannot trust the data in the file as it might be incomplete. I am looking for an OS-independent solution, and I do not need to consider "attacks" that maliciously modify the contents of the serialized file.
My question is: Is the seal object approach outlined above safe, or are there still some corner cases where I can end up reading an incomplete file without noticing it?

Just write the file under a different, temporary name. Once the file is complete, delete any previous version of the file and rename the new file to the real name.
If program dies during write, you're just left with an incomplete temp file. The real file is still as before (or missing), so you'll never see an incomplete file to load.

How to safely create a file if it doesn't exist from concurrently-running processes without using locking?

Suppose two (or more) concurrently-running Java processes need to check for the existence of a file, create it if it doesn't exist, and then potentially read from that file over the course of their runs. We want to protect ourselves against the possibility of multiple writer processes clobbering each other and/or reader processes reading an incomplete or inconsistent version of the file.
What we're currently doing to arbitrate this situation is to use Java NIO FileLocks. One process manages to acquire an exclusive lock on the file to be created using FileChannel.tryLock() and creates it, while the other concurrently-running processes fail to acquire a lock and fall back to using an in-memory version of the file for their runs.
Locking is causing various problems for us on our compute farm, however, so we're exploring alternatives. So my question to you is: is there a way to do this safely without using file locks?
Could, eg., the processes write to independent temporary files when they find a file doesn't exist, and then more or less "atomically" move the temp file(s) into place after they've been written? In this scenario, we might end up with multiple writer processes, but that would be ok provided that any processes reading from the file always read one version or another, and not a mix of two or more versions. However, I don't think all operating systems guarantee that if you have a file open for reading, you'll continue reading from the original version of the file even if it's overwritten mid-way through the read.
Any suggestions would be much appreciated!

Suppose two (or more) concurrently-running Java processes need to check for the existence of a file, create it if it doesn't exist, and then potentially read from that file over the course of their runs.
I don't quite understand the create and read part of the question. If you are looking to make sure that you have a unique file then you could use new File(...).createNewFile() and check to make sure that it returns true. To quote from the Javadocs:
Atomically creates a new, empty file named by this abstract pathname if
and only if a file with this name does not yet exist. The check for the
existence of the file and the creation of the file if it does not exist
are a single operation that is atomic with respect to all other
filesystem activities that might affect the file.
This would give you a unique file that only that process (or thread) would then "own". I'm not sure how you were planning on letting the writer know which file to write to however.
If you are talking about creating a unique file that you write do and then moved into a write directory to be consumed then the above should work. You would need to create a unique name in the write directory once you were done as well.
You could use something like the following:
private File getUniqueFile(File dir, String prefix) {
long suffix = System.currentTimeMillis();
while (true) {
File file = new File(dir, prefix + suffix);
// try creating this file, if true then it is unique
if (file.createNewFile()) {
return file;
}
// someone already has that suffix so ++ and try again
suffix++;
}
}
As an alternative, you could also create a unique filename using UUID.randomUUID() or something to generate a unique name.

Storing state in Java

Broad discussion question.
Are there any libraries already which allow me to store the state of execution of my application in Java?
E.g I have an application which processes files, now the application may be forced to shutdown suddenly at some point.I want to store the information on what all files have been processed and what all have not been, and what stage the processing was on for the ongoing processes.
Are there already any libraries which abstract this functionality or I would have to implement it from scratch?

It seems like what you are looking for is serialization which can be performed with the Java Serialization API.
You can write even less code if you decide to use known libraries such as Apache Commons Lang, and its SerializationUtils class which itself is built on top the Java Serialization API.
Using the latest, serializing/deserializing your application state into a file is done in a few lines.
The only thing you have to do is create a class holding your application state, let's call it... ApplicationState :-) It can look like that:
class ApplicationState {
enum ProcessState {
READ_DONE,
PROCESSING_STARTED,
PROCESSING_ENDED,
ANOTHER_STATE;
}
private List<String> filesDone, filesToDo;
private String currentlyProcessingFile;
private ProcessState currentProcessState;
}
With such a structure, and using SerializationUtils, serializing is done the following way:
try {
ApplicationState state = new ApplicationState();
...
// File to serialize object to
String fileName = "applicationState.ser";
// New file output stream for the file
FileOutputStream fos = new FileOutputStream(fileName);
// Serialize String
SerializationUtils.serialize(state, fos);
fos.close();
// Open FileInputStream to the file
FileInputStream fis = new FileInputStream(fileName);
// Deserialize and cast into String
String ser = (String) SerializationUtils.deserialize(fis);
System.out.println(ser);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}

It sounds like the Java Preferences API might be a good option for you. This can store user/system settings with minimal effort on your part and you can update/retrieve at any time.
https://docs.oracle.com/javase/8/docs/technotes/guides/preferences/index.html

It's pretty simple to make from scratch. You could follow this:
Have a DB (or just a file) that stores the information of processing progress. Something like:
Id|fileName|status|metadata
As soon as you start processing a file make a entry to this table. Ans mark status as PROCESSING, the you can store intermediate states, and finally when you're done you can set status to DONE. This way, on restart, you would know what are the files processed; what are the files that were in-citu when the process shutdown/crashed. And (obviously) where to start.
In large enterprise environment where applications are loosely coupled (and there is no guarantee if the application will be available or might crash), we use Message Queue to do something like the same to ensure reliable architecture.

There are almost too many ways to mention. I would choice the option you believe is simplest.
You can use;
a file to record what is done (and what is to be done)
a persistent queue on JMS (which support multiple processes, even on different machine)
a embedded or remote database.
An approach I rave about is using memory mapped files. A nice feature is that information is not lost if the application dies or is killed (provided the OS doesn't crash) which means you don't have to flush it, nor worry about losing data if you don't.
This works because the data is partly managed by the OS which means it uses little heap (even for TB of data) and the OS deals with loading and flushing to disk making it much faster (and making sizes much larger than your main memory practical).
BTW: This approach works even with a kill -9 as the OS flushes the data to disk. To test this I use Unsafe.getByte(0) which crashes the application with a SEG fault immediately after making a change (as in the next machine code instruction) and it still writes the change to disk.
This won't work if you pull the power, but you have to be really quick. You can use memory mapped files to force the data to disk before continuing, but I don't know how you can test this really works. ;)
I have a library which could make memory mapped files easier to use
https://github.com/peter-lawrey/Java-Chronicle
Its a not long read and you can use it as an example.

Apache Commons Configuration API: http://commons.apache.org/proper/commons-configuration/userguide/howto_filebased.html#File-based_Configurations

How to know whether a file was processed before

How can I be sure if a file was processed before? There is a remote storage location which is a file source for my application. My program gets files from this location and processes them in a scheduled way. How can I be sure that the next time I fetch only non-processed files? I'm thinking about using file attributes. The archive and modified date can be a solution. But I learned that two bits of file attributes are not used. How can I use these fields in Java? By the way I don't want to use a database.

A common strategy is to use some form of hash function to create a checksum. Record the checksum of the file, and compare the list of processed files identified by checksum against the file in question. If the checksum of the file in question is in the list, you have already processed it.
Protect your list of processed file checksums. If you lose it, or it becomes corrupted, it might be a long, bad day.
To prevent unnecessary network traffic, you might consider preparing 'check' files on the remote repository that contain a checksum that corresponds to a potential input file.
EDIT:
Upon further comment, it is potentially possible to directly interact with file system attributes. The proposed Java 1.7 spec introduces file-system specific attribute views to directly interact with these attributes. The view you would be interested in is 'DosFileAttributeView'.
Basic use might be something similar to this ('input' is a file based on a java 'Path'; add necessary exception handling):
// import as necessary from java.nio.file and java.io
DosFileAttributeView view = input.getFileAttributeView(DosFileAttributeView.class);
//Check if the system supports this view
if (view != null)
{
DosFileAttributes attributes = view.readAttributes();
// skip any file already marked as an archive
if (!attributes.isArchive())
{
myObject.process(input)
attributes.setArchive(true)
}
}

Can you rename the file (e.g. "filename.archive")? or into an "archive" subdirectory?

Is it possible to prepend data to an file without rewriting?

I deal with very large binary files ( several GB to multiple TB per file ). These files exist in a legacy format and upgrading requires writing a header to the FRONT of the file. I can create a new file and rewrite the data but sometimes this can take a long time. I'm wondering if there is any faster way to accomplish this upgrade. The platform is limited to Linux and I'm willing to use low-level functions (ASM, C, C++) / file system tricks to make this happen. The primimary library is Java and JNI is completely acceptable.

There's no general way to do this natively.
Maybe some file-systems provide some functions to do this (cannot give any hint about this), but your code will then be file-system dependent.
A solution could be that of simulating a file-system: you could store your data on a set of several files, and then provide some functions to open, read and write data as if it was a single file.

Sounds crazy, but you can store the file data in reverse order, if it is possible to change function that reads data from file. In that case you can append data (in reverse order) at the end of the file. It is just a general idea, so I can't recommend anything particular.
The code for reversing of current file can looks like this:
std::string records;
ofstream out;
std::copy( records.rbegin(), records.rend(), std::ostream_iterator<string>(out));

It depends on what you mean by "filesystem tricks". If you're willing to get down-and-dirty with the filesystem's on-disk format, and the size of the header you want to add is a multiple of the filesystem block size, then you could write a program to directly manipulate the filesystem's on-disk structures (with the filesystem unmounted).
This enterprise is about as hairy as it sounds though - it'd likely only be worth it if you had hundreds of these giant files to process.

I would just use the standard Linux tools to do it.
Writting another application to do it seems like it would be sub-optimal.
cat headerFile oldFile > tmpFile && mv tmpFile oldFile

I know this is an old question, but I hope this helps someone in the future. Similar to simulating a filesystem, you could simply use a named pipe:
mkfifo /path/to/file_to_be_read
{ echo "HEADER"; cat /path/to/source_file; } > /path/to/file_to_be_read
Then, you run your legacy program against /path/to/file_to_be_read, and the input would be:
HEADER
contents of /path/to/source_file
...
This will work as long as the program reads the file sequentially and doesn't do mmap() or rewind() past the buffer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.