Java in-memory file structure? - java

I need to do a lot of things with resources on the fly: parsing xsd/xml docs, building and compiling java classes, package them into jars ans wars, persist in DB, deploy them as OSGi, etc.
Most of the libraries/API's, which I use, allow to perform all these intermediate tasks in memory, but there are some "special" libraries operating with java.io.File only. And there's nothing left for me but using real temporary files and directories which is not good in Java EE environment.
I believe there must be a library/solution for in-memory file structure having nodes extending java.io.File (as I see it). Please drop in a link to known/similar libraries. Any comments are welcome.
Thanks!

I do not believe you are going to find what you are looking for. The java.io.File API was not written with the intention of providing a file system abstraction that can be implemented in a variety of ways. While it does expose method for some FS operations (such as delete and mkdir), it doesn't handle the basic read/write I/O. That is left to other classes, such as FileInputStream. This means that from API standpoint, a File object is no more than a path. Nothing is abstracted. You are stuck.

One option is to use a RAM disk. Your program will think its using the disk with java.io.File, but it will really be using main memory.

There is a fine alternative available: https://github.com/google/jimfs
This supports java(7+) in memory filesystem handling and is very easy to use too.

Related

java.io.File vs java.nio.Files which is the preferred in new code?

While writing answers around SO, a user tried pointing out that java.io.File should not be used in new code, instead he argues that the the new object java.nio.Files should be used instead; he linked to this article.
Now I have been developing in Java for several years now, and have not heard this argument before; since reading his post I have been searching, and have not found many other sources that confirm this, and personally, I feel like many of the points argued in the article are weak and that if you know how to read them, errors thrown by the File class will generally tell you exactly what the issue is.
As I am continually developing new code my question is this:
Is this an active argument in the Java community? Is Files preferred over File for new code? What are the major advantages / disadvantages between the two?
The documentation that you linked give the answer:
The java.nio.file package defines interfaces and classes for the Java
virtual machine to access files, file attributes, and file systems.
This API may be used to overcome many of the limitations of the
java.io.File class. The toPath method may be used to obtain a Path
that uses the abstract path represented by a File object to locate a
file. The resulting Path may be used with the Files class to provide
more efficient and extensive access to additional file operations,
file attributes, and I/O exceptions to help diagnose errors when an
operation on a file fails.
File has a newer implementation: Path. With a builder Paths.get("..."). And Files has many nice utility functions with better implementations too (move instead of the sometimes failing File.renameTo).
A Path maintains its file system. Hence you can copy out of a zip file system ("jar:file:..... .zip") some path to another file system and vice versa.
File.toPath() may help an incremental transition.
The utilities alone in Files make a move to the newer classes profitable.

Is there a way to pass the contents of class/jar files to a JVM without saving them explicitly on disk?

Suppose that I want to prevent trivial disassembly of jar/class files.
A JVM is started from a C++ application that can descramble the jar/class files that are stored within its own executable. Is there a way of somehow streaming the contents of such files to a JVM without saving them on disk?
I'm looking for a solution on both windows and unix platforms.
You can create a ClassLoader which gets its class data from anywhere. You could even have it call native methods to obtain byte code for a class. Have a look at URLClassLoader which is widely used, it can obtain it's classes from files on disk or the network or any supported URL.
Think part what you're after is supplied by the JarInputStream class, Docs
You'd need some custom class-loading behavior as well. May need to create a Classloader implementation that loads your classes as well if you go that route. It might be simpler to use the URLClassloader as well depending on your circumstances.

Can I close file handles opened by code I don't own?

I'm using a third-party commercial library which seems to be leaking file handles (I verified this on Linux using lsof). Eventually the server (Tomcat) starts getting the infamous "Too many open files error", and I have to re-start the JVM.
I've already contacted the vendor. In the meantime, however, I would like to find a workaround for this. I do not have access to their source code. Is there any way, in Java, to clean up file handles without having access to the original File object (or FileWriter, FileOutputStream, etc.)?
a fun way would be to write a dynamic library and use LD_PRELOAD to load it for the java instance you are launching ... this DLL could override the appropriate underlying open(2) system call (or use some other logic) to close existing file descriptors of the process before passing the call to the libc implementation (or the kernel). You need to do some serious accounting and possibly deal with threads; but it can be done. Especially if you take hints from /proc/pid/fd/ for figuring whether or not a close is appropriate for the target fd.
You could, on startup, open a bunch of files and use File*putStream.getFD() to obtain a bunch of java.io.FileDescriptors, then close them, but hold onto the descriptors. Later you might be able to create streams using those stored FileDescriptors and close them.
I have not tested this, so would not be surprised if it did not work on some platforms.

Java content APIs for a large number of files

Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.
Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.
Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.

Can you access items inside a jar using File

I have some files inside a jar which I would like to access in Java using a File object rather than as a stream. Is it possible to do this?
Look at JarFile.
java.io.File is an abstraction from os specific handling of files. If you use java.io.File in your code, the code should run on all Java platforms.
The Jar is not a os file system. So it makes no sense to apply java.io.Files from the Java core classes.
I don't want to say it is not possible. Maybe it has sense for certain application and there is a library for that kind of abstraction.
You can also access it as a URL with a "jar:" prefix, but that's not a File object either, so I guess that doesn't meet the restriction.
Why do you have to access it as a File? This seems like asking, "Is there any way I can add two numbers without using the plus operator?" Maybe you can, but why do you not want to do it the easy way?

Categories

Resources