I have a command-line executable which I need to run from Java on Windows XP. It uses files as input and output. But I want to avoid the overhead of file IO, so I thought of an in-memory RAM file system.
NetBSD has mount_mfs.
Could you recommend the most convenient way of doing this?
You should also consider whether you really need this (premature optimization, yadda, yadda). On all modern operating systems, filesystem I/O is cached anyway, so frequently-used files are essentially as fast as a RAM disk.
Related question (with many good answers):
RAM drive for compiling - is there such a thing?
Commons VFS provides handy interfaces to virtual filesystems, inclunding in-memory file system.
Related
I have the following requirement:
I need to unpack a zip or tar.gz file. The target platform where this code will run is an AEM 6.1 environment. The system had some performance issues in the past. Especially memory usage was much to high. Therefore I have to save memory. The zip/tar.gz file contains some text-Files, SVG, PNG and EPS files as well as more files I don't need. The archive file will be downloaded and available as an ByteArrayInputStream.
I did some research and tried to figure out, which is the best way to do that. Apache commons provides libraries to unpack archives as well as the JDK. But I could not figure out, what implementation uses less memory.
I think, it would be the best, if I could open the archive while it is still compressed, and read and unpack the containing files separately. So I would just have the compressed archive and the one of its containing file in in memory.
But I'm not sure, which implementation provides this possibility or if there is a better way to do that.
Has somebody a good advice?
Thank you and best regards.
ZipInputStream from the JDK does just what you need: https://docs.oracle.com/javase/8/docs/api/java/util/zip/ZipInputStream.html
You can find the entry you need via getNextEntry().getName(), and read the bytes just for that entry. The ZipInputStream.read method allows you to implement buffered read, so you can easily limit the memory consumption if you don't need the whole decompressed entry in memory (i.e. if you write the entry into output as you read it).
In this case you can minimize the footprint of you application as well, since you won't need any extra libraries.
How to measure write/read disc speed using java, i know about SIGAR library, but i can't found such methods in there. Maybe someone know solution?
The problem is that I need to determine at what rate currently is writing to disk, and at what speed is being read. Furthermore, ideally, the data must be obtained for specific directories. But if you tell me at least how to define the entire disk would be very grateful.
How you obtain this information is entirely dependant on the operating system.
On Linux the simplest tool is to use iostat which will show you how much read/write in blocks and count for each file system.
Performance measurements on a per directory basis are not very meaningful as file systems are not implemented that way. You write to files which can be anywhere on disk and these files might appear in one or more directories. The files are not physically arranged by directory.
Just wondering, if it is generally a good idea to compress jar files that will be shipped with a desktop application (no network access to jars), of if the decompression will have a bigger impact than file io.
EDIT: Thanks for the answers so far, and sorry for being a bit unclear here. I was not speaking about shipping the jars to the customer, but of the optimal format for the jar files on the disk when the app start ups. I know that jar files are zip files and can be served with different compression levels (or no compression at all), and I was directly wondering how compression would alter startup performance, not only on my dev box (has a fast SSD disk in it, but also on slower disks).
I expect that the answer depends on your application. However, it should be easy to determine experimentally if compressed JARs give faster or slower startup for your application. Just build your application JAR file with compression on and compression off, and compare the application startup times. (Try it on different machines; e.g. with slow discs, fast discs, SSD, and with different amounts of RAM. Bear in mind that some OSes cache files aggressively, and take this into account in your timing measurements.)
While you are at it, you should also investigate the impact of different compression levels (via the jar command options) and using pack200
Having said that, my gut feeling is that the difference between compressed and uncompressed for locally installed JARs will be small enough that the user will hardly notice the difference.
In almost any reasonable desktop situation, the cost of disk IO is way higher than the cost of compression. It'll almost certainly be a win to compress files.
That said, a JAR file is already compressed. Doubly compressing things is generally not worth the effort. So I'd say no, don't compress your JAR files as they are already compressed.
When mounting an NFS filesystem, all data handling goes through the nfs client. How can I write my own handlers to use something other than NFS?
An alternative would be a localhost NFS server but that seams awfully inefficient
Edit
Example of what should happen
Normally with a filesystem you get: app reads/writes filesystem, Solaris sees where it is mounted and if it is disk then it reads/writes the disk. If it is software mirror it reads and writes to the mirror software. If it it is NFS it reads and writes to a remote NFS server. I want it to read and write to a custom storage software instead of any of the above mentioned options.
Our storage software is for storing files that applications use, it is geared towards large or frequently replaced chunks of data that are not stored in a database. It also includes certain flexibility specific to our company.
Old/existing applications don't know about our new software. All they know to do is read/write a directory. We could tell Solaris that the directory was hosted on NFS and then the NFS server translates and connects to the storage software. We would prefer to tell Solaris about our new program which Solaris has never heard of and then teach Solaris how to talk to our program.
To me this sounds like you'd have to create a pseudo file system. Solaris uses VFS (Virtual File System), under which you can use different filesystems presented as one uniform structure to userspace. Wheither you mount a UFS or NFS or WHATEVER filesystem, users and applications can use filesystem-agnostic tools to interact with VFS.
That means that what you need to create a pseudo file system; a filesystem that manages to handle the vnode and vfs operations (VFS syscall interface), such as read(), write() etc and tie them, (decide what to do when someone opens a particular file etc), to a database-backend of your choice.
Read more:
http://developers.sun.com/solaris/articles/solaris_internals_ch14_file_system_framework.pdf
Sounds like a big task...
Regards,
jgr
You might want to look at some CIFS servers. Alfresco has JCIFS, which is a CIFS server library in Java. It lets you present resources as files, as if they're on a Windows system. So, that means that programs can "mount" these CIFS servers, and you can publish data from your Database via that mechanism.
I have not used it, but that sounds like what you want to do and perhaps something you may want to look in to.
There's also FUSE which lets you create custom file systems in "user mode" rather than having to hack the kernel. It works on Unix and Mac OS, there may be a Windows version as well. This can, in theory, do anything.
For example, there are instances that let you mount a remote system over SSH using a FUSE system. These tend to be written in C/C++.
NFS isn't about mounting a directory on software but mounting a remote share on a directory. Whether the storage device is remote or not doesn't matter that much, it is still through layers of kernel software. Solaris use VFS to provide the first layer. You should implement the underlying one. That would be quite a difficult task for someone already familiar with VFS. As you obviously are not familiar with writing kernel code, I would be very pessimistic about your project ...
What I would suggest you to do instead would be a simpler and less risky approach. Implement an interposition library that would intercept the application I/O code (open, read, write, close, and the likes or perhaps libc fopen, fwrite, you have to figure out what is the best location to interpose) and call your storage software instead.
Here is a simple example of the process:
http://developers.sun.com/solaris/articles/lib_interposers.html
I need to perform a simple grep and other manipulations on large files in Java. I am not that familiar with the Java NIO utilities, but I am assuming that is what I need to use. What resources or helpful tips do you have for reading/writing large files. Also, I am working on a SWT application and need to display parts of that data within a text area on a GUI.
java.io.RandomAccessFile uses long for file-pointer offset so should be able to cope. However, you should read a chunk at a time otherwise overheads will be high. FileInputStream works similarly.
Java NIO shouldn't be too difficult. You don't need to mess around with Selectors or similar. In fact, prior to JDK7 you can't select with files. However, avoid mapping files. There is no unmap, so if you try to do it lots you'll run out of address space on 32-bit systems, or run into other problems (NIO does attempt to call GC, but it's a bit of a hack).
If all you are doing is reading the entire file a chunk at a time, with no special processing, then nio and java.io.RandomAccessFile are probably overkill. Just read and process the content of the file a block at a time. Ensure that you use a BufferedInputStream or BufferedReader.
If you have to read the entire file to do what you are doing, and you read only one file at a time, then you will gain little benefit from nio.
Maybe a little bit off topic: Have a look on VFS by apache. It's originally meant to be a library for hiding the ftp-http-file-whatever system behind a file system facade from your application's point of view. I mentioning it here because I have positive experience with accessing large files (via ftp) for searching, reading, copying etc. (large in that context means > 15MB) with this library.