Moving files after failed validation (Java) - java

We are validating XML files and depending on the result of the validation we have to move the file into a different folder.
When the XML is valid the validator returns a value and we can move the file without a problem. Same thing happens when the XML is not valid according to the schema.
If however the XML is not well formed the validator throws an exception and when we try to move the file, it fails. We believe there is still a handle in the memory somewhere that keeps hold of the file. We tried putting System.gc() before moving the file and that sorted the problem but we can't have System.gc() as a solution.
The code looks like this. We have a File object from which we create a StreamSource. The StreamSource is then passed to the validator. When the XML is not well formed it throws a SAXException. In the exception handling we use the .renameTo() method to move the file.
sc = new StreamSource(xmlFile);
validator.validate(sc);
In the catch we tried
validator.reset();
validator=null;
sc=null;
but still .renameTo() is not able to move the file. If we put System.gc() in the catch, the move will succeed.
Can someone enlight me how to sort this without System.gc()?
We use JAXP and saxon-9.1.0.8 as the parser.
Many thanks

Try creating a FileInputStream and passing that into StreamSource then close the FileInputStream when you're done. By passing in a File you have lost control of how/when to close the file handle.

When you set sc = null, you are indicating to the garbage collector that the StreamSource file is no longer being used, and that it can be collected. Streams close themselves in their destroy() method, so if they are garbage collected, they will be closed, and therefore can be moved on a Windows system (you will not have this problem on a Unix system).
To solve the problem without manually invoking the GC, simply call sc.getInputStream().close() before sc = null. This is good practice anyway.
A common pattern is to do a try .. finally block around any file handle usage, eg.
try {
sc = new StreamSource(xmlFile);
// check stuff
} finally {
sc.getInputStream().close();
}
// move to the appropriate place
In Java 7, you can instead use the new try with resources block.

Try sc.getInputStream().close() in the catch

All the three answers already given are right : you must close the underlying stream, either with a direct call to StramSource, or getting getting the stream and closing it, or creating the stream yourself and closing it.
However, I've already seen this happening, under windows, since at least three years : even if you close the stream, really every stream, if you try to move or delete the file, it will throw exception .. unless ... you explicitly call System.gc().
However, since System.gc() is not mandatory for a JVM to actually execute a round of garbage collection, and since even if it was the JVM is not mandated to remove all possible garbage object, you have no real way of being sure that the file can be deleted "now".
I don't have a clear explanation, I can only imagine that probably the windows implementation of java.io somehow caches the file handle and does not close it, until the handle gets garbage collected.
It has been reported, but I haven't confirmed it, that java.nio is not subject to this behavior, cause it has more low level control on file descriptors.
A solution I've used in the past, but which is quite a hack, was to :
Put files to delete on a "list"
Have a background thread check that list periodically, calla System.gc and try to delete those files.
Remove from the list the files you managed to delete, and keep there those that are not yet ready to.
Usually the "lag" is in the order of a few milliseconds, with some exceptions of files surviving a bit more.
It could be a good idea to also call deleteOnExit on those files, so that if the JVM terminates before your thread finished cleaning some files, the JVM will try to delete them. However, deleteOnExit had it's own bug at the time, preventing exactly the removal of the file, so I didn't. Maybe today it's resolved and you can trust deleteOnExit.
This is the JRE bug that i find most annoying and stupid, and cannot believe it is still in existence, but unfortunately I hit it just a month ago on windows Vista with latest JRE installed.

Pretty old, but some people may still find this question.
I was using Oracle Java 1.8.0_77.
The problem occurs on Windows, not on Linux.
The StreamSource instanciated with a File seems to automatically allocate and release the file resource when processed by a validator or transformer. (getInputStream() returns null)
On Windows moving a file into the place of the source file (deleting the source file) after the processing is not possible.
Solution/Workaround: Move the file using
Files.move(from.toPath(), to.toPath(), REPLACE_EXISTING, ATOMIC_MOVE);
The use of ATOMIC_MOVE here is the critical point. Whatever the reason ist, it has something to do with the annoying behavior of Windows locking files.

Related

Java: Assert that file serialization is complete?

I am dealing with an object in Java that is very expensive to compute and several megabytes in size. In order to preserve it across application restarts, I want to serialize it into a File, and re-load that file on startup (if present).
The problem is that most file systems are not transactional. The file writing process can be interrupted by exceptions, JVM termination and/or power failure. What I absolutely need to assert is that if the file is used, the information within is complete. I can throw away the information and recalculate if needed, but reading and relying on incomplete data must be avoided.
My attempt would be to serialize and write a "seal" object at the end of the file, like a checksum for example. The presence of this object during deserialization guarantees that the serialization process was complete. If the seal object is absent during deserialization, I know that I cannot trust the data in the file as it might be incomplete. I am looking for an OS-independent solution, and I do not need to consider "attacks" that maliciously modify the contents of the serialized file.
My question is: Is the seal object approach outlined above safe, or are there still some corner cases where I can end up reading an incomplete file without noticing it?
Just write the file under a different, temporary name. Once the file is complete, delete any previous version of the file and rename the new file to the real name.
If program dies during write, you're just left with an incomplete temp file. The real file is still as before (or missing), so you'll never see an incomplete file to load.

What is the right way to close and/or delete a memory mapped file?

From what I've read, it's a bit tricky closing a memory mapped file in Java.
By default, they're closed only by a mechanism akin to (but more efficient than) finalization.
I already know it's possible to close them explicitly in an implementation-specific way (but one common to both OpenJDK and the Oracle JDK) by using reflection, using the following code (inspired by this answer to a related question):
try {
Method cleanerMethod = buffer.getClass().getMethod("cleaner");
cleanerMethod.setAccessible(true);
Object cleaner = cleanerMethod.invoke(buffer);
Method cleanMethod = cleaner.getClass().getMethod("clean");
cleanMethod.setAccessible(true);
cleanMethod.invoke(cleaner);
} catch(Exception ex) { /* log exception */ }
I gather from the discussion in that question that it's not reliably possible to delete the backing file for a MappedByteBuffer without closing the buffer in this manner.
However, there are also other, related resources that must be closed: the RandomAccessFile and the FileChannel that were used to create the MappedByteBuffer.
Does the order in which these resources are closed matter? Are there any differences between the order in which they must be closed on Mac/Windows/Linux?
Ultimately, what I want to know how to do safely comes down to these two questions:
What is the correct way to close a MappedByteBuffer (and related resources) and ensure the backing file is saved?
Is there a way to close a MappedByteBuffer (and related resources) without accidentally causing it to write uncommitted changes to the disk, when the goal is to quickly delete the backing file?

Is there a way to tell if a classpath resource is a file or a directory?

For example, this snippet throws a NullPointerException(!) on the stream.read() line, assuming the com.google package exists in a JAR somewhere (Guava, for example).
ClassLoader classLoader = getClass().getClassLoader();
URL resource = classLoader.getResource("com/google");
InputStream stream = resource.openStream();
System.out.println(stream.toString()); // Fine -- stream is not null
stream.read(); // NPE inside FilterInputStream.read()!
If com/google is swapped with a package that's in the file system rather than a JAR, then the snippet doesn't crash at all. In fact, it seems to read the files in that directory, separated by newlines, though I can't imagine that behaviour is specified anywhere.
Is there a way test if the resource path "com/google" points to a "normal" resource file or to a directory?
This is a bit of a mess due to some unspecified behaviour for the protocol handlers involved in loading these resources. In this particular situation, there are two: sun.net.www.protocol.file.Handler and sun.net.www.protocol.jar.Handler, and they each handle the directory case a bit differently. Based on some experiments, here's what they each do:
sun.net.www.protocol.file.Handler:
What this Handler does is open a FileURLConnection, which does exactly what you discovered it did when confronted with a directory. You can check if it's a directory just with:
if (resource.getProtocol().equals("file")) {
return new File(resource.getPath()).isDirectory();
}
sun.net.www.protocol.jar.Handler:
This Handler, on the other hand, opens a JarURLConnection which eventually makes its way to a ZipCoder. If you take a look at that code, you'll notice something interesting: jzentry will come back null from the native JNI call because the JAR zip file does not, in fact, contain a file called com/google, and so it returns null to the stream that wraps it.
However, there is a solution. Although the ZipCoder won't find com/google, it will find com/google/ (this is how most ZIP interfaces work, for some reason). In that case, the jzentry will be found, and it'll just return a null byte.
So, cutting through all these random implementation-specific behaviours, you can probably figure out if it's a directory by first trying to access the resource with a trailing / (which is what URLClassLoaders expect for directories anyway). If ClassLoader.getResource() returns non-null, then it's a directory. If it doesn't, try without the trailing slash. If it returns non-null, it's a file. If it still returns null, then it's not even an existing resource.
Kinda hacky, but I don't think there's anything better. I hope this helps!
There is no safe and generic way to detect this. When you use ClassLoader.getResource(), the ClassLoader can return practically anything in the URL, in principle even something you have never seen before if the ClassLoader implements its own URL scheme (and protocol).
Your only option is to analyze the URL returned by getResource(), the protocol should hint at what it is (e.g. "file://"). But beware, depending on environment it may return things you did not plan for.
But to just access a resource, you don't care where it comes from (you may care if you're debugging a configuration issue, but your code should not care).
In general you should not make assumptions about the returned InputStream's capabilities, i.e. do not rely on it supporting mark/reset etc. The only safe operation would be simply reading the Stream. If an IOException occurs during read it indicates a problem with access to the resource (network connection lost etc.).
EDIT: getResource() should IMO only return resources (e.g. files or zip file entries), but never directories (since they are not resources). However I wouldn't count on every possible ClassLoader to do so, and I'm not sure what the correct behavior is (if its even specified somewhere).
I think that there are 2 solutions.
Naive solution based on analysis of the path itself. If it ends with .jar or .zip or .war or .ear it is a file. Otherwise it is a directory. I think that this approach will work in 99.99% of cases unless somebody tries to make you you to fail on purpose. For example by defining soft link that looks like a directory but is a file or vise versa.
Try to mimic the JVM logic that interprets paths of classpath relatively to the current working directory. So, retrieve current working directory by using new File("."), then take classpath, split it and for each its element use new File(".", classPathElement) unless it is defined using absolute path.
Good luck with this.

java: check if a file can be moved under windows

I need to rename a file (keeping it in the same directory);
I can't seem to find a way to see if my program has the required permissions:
Files.isWritable(directory) && Files.isWritable(oldFile);
always returns true, wether or not the running user has the permission to write the file (I guess they only check if the file is read-only, even if this violates the contract stated in the javadoc);
I'm not running under a security manager so I can't call
System.getSecurityManager.checkDelete(oldFile.toString());
I need to check if the renaming of several files will (probably) succeed so I can't just try and catch the exception.
Is there a way out? Obviously a portable solution would be lovable but I would settle for a windows-specific one...
Well, you can't check Windows ACLs that way. The "native" solution is fairly easy, since Windows supports transactions on the file system. Just call DeleteFileTransacted, and roll back the transaction if any one deletion fails.
If you're not using tranactions, then the second option is to first open handles with explicit DELETE desired access (DELETE is one of the standard WinNT access rights), denying any sharing. If and only if this succeeds for all files, delete them all with SetFileInformationByHandle(handle, FileDispositionInfo, &fdiObj, sizeof(fdiObj));
(The latter is not a transaction and may have Isolation issues as a result, which in turn affect Atomicity).
Try new SecurityManager().checkDelete(oldFile.toString()).
Just try to move it! If the move failed, you didn't have permissions, or something else went wrong.
This is a general principle. Don't try to foretell the future, guessing whether an impending operation will succeed. Try the operation. Otherwise you introduce all sorts of extra problems:
You might make the wrong test.
The condition might change between the test and the operation.
The operation usually returns an error or throws an exception anyway, which you have to write code to handle: why write it all twice?

is it possible for delete file without new file instance in Java?

I have a simple function used for file delete,
it will check the file size,
if small than a specific value, delete the file
however, this function will be called thousand times
and every time it will new file instance,
i think it will be expensive on file object creation issue,
is there any other way to fix this issue?
public void checkFile(String filePath) {
File file = new File(filePath); //this is expensive
if (file.length() < 500) {
file.delete();
}
}
The effect on the performance of the new File() compared to checking the file size on the disk is miniscule. Don't worry about it.
If you really really think that it will make a difference, measure it and then optimise it.
IMHO "thinking" isn't good enough; have you really identified that File object creation is a bottle neck in your application? Anyways, I don't think you can delete a file without creating a File object, unless you are planning on writing your own "native" method which unlinks the file by just taking in the file path as a string.
Why would the code be expensive? Creating temporary objects in Java is not expensive anymore, due to generational GC. And a File is just an object encapsulating a path to the file system. It's not expensive to create one.
Standard java API does not allow this. And thousands of times is almost nothing for modern computer. Creation of java.io.File instance takes less time than deletion, so do not worry. If you see any problems with this code you can create cache as a Map<String, File> and get the file instance from there.
But again, do not do this unless you see that this is your problem. No pre-mature optimization!
There is no way to delete a file in pure Java that doesn't entail creating a File object. The impure alternatives are:
using JNI or JNA to call native code that will call unlink or the Window equivalent,
running the rm or del command as an external process.
The first is at best only marginally faster than new File().delete(). The second is significantly slower.
I'd say that 90+% of the cost of new File().delete() is in the system call and the operating system's file system layers.

Categories

Resources