I am trying to implement a minimal FTP Server using java. On this server, I want all files to exist in memory only. Nothing should be written on disk.
Having said this, I have to create a virtual file system, comprised of a root directory, some sub-directories and files. A few of those will initially be loaded from the hard disk and then will only be handled in memory.
My question is: is there an efficient way to implement this in Java? Is there something that is preimplemented? A Class I should use? (I don't have access to all libraries: java.lang, java.io)
Assuming there is not, I have created my own simple FileSystem, Directory and File classes. I have no idea, however, how I should store the actual data in memory. Knowing that a file can be an image, a text file or anything else that could plausibly be exchanged with an FTP server, how should I store it? Also, there are two transfer modes that I should be able to use: binary and ASCII. So in whatever format I store the data, I should be able to convert them to some kind of binary or ASCII format.
I know the question is a bit abstract, any sort of hints as to where I should look will be appreciated.
The data will be in memory unless written to the disk.
Assuming you have the relevant data stored in some variables then simply do not call any file writer and the variable will remain in java's stack/heap without being stored onto the filesystem.
Related
My usecase is as follows. I want to move a file from a certain s3Location to a different s3Location, using the Java S3 SDK.
For instance, if the file is in bucket/current, I want to move it to bucket/old.
I currently can download a file as an S3Object, turn that object into a File (java.io, since the S3 Java client for reasons I don't understand does not allow you to upload an S3Object, the very same object you download!) and upload that file. I'm curious if there is a better approach to this.
Thanks!
There is no direct implementation of a rename or move operation in S3. Instead, the typical solution is to copy the object to the new location and then delete the original. You can accomplish this with the AmazonS3#copyObject and AmazonS3#deleteObject methods of the AWS SDK for Java.
This is more efficient than the technique you described in your question of downloading the file locally and then re-uploading it under the new key. copyObject internally makes use of S3 server-side copy provided in the S3 REST API PUT Object - Copy operation. The copy is performed on the S3 server side, so you won't have to pay the I/O costs (and real money costs if transiting out of AWS servers) compared to a local file download/upload.
Please be aware that this is much different from the rename operation as provided in a typical local file system, for multiple reasons:
It is not atomic. Most local file systems provide an atomic rename operation, which is useful for building safe "commit" or "checkpoint" constructs to publish the fact that a file is done being written and ready for consumption by some other process.
It is not as fast as a local file system rename. For typical local file systems, rename is a metadata operation that involves manipulating a small amount of information in inodes. With the copy/delete technique I described, all of the data must be copied, even if that copy is performed on the server side by S3.
Your application may be subject to unique edge cases caused by the Amazon S3 Data Consistency Model.
You can use moveObject of the StorageService class.
the title actually tells the issue. And before you get me wrong, I DO NOT want to know how this can be done, but how I can prevent it.
I want to write a file uploader (in Java with JPA and MySQL database). Since I'm not yet 100% sure about the internal management, there is the possibility that at some point the file could be executed/opened internally.
So, therefor I'd be glad to know, what there is, an attacker can do to harm, infect or manipulate my system by uploading whatever type of file, may it be a media file, a binary or whatever.
For instance:
What about special characters in the file name?
What about manipulating meta data like EXIF?
What about "embedded viruses" like in an MP3 file?
I hope this is not too vague and I'd be glad to read your tips and hints.
Best regards,
Stacky
It's really very application specific. If you're using a particular web app like phpBB, there are completely different security needs than if you're running a news group. If you want tailored security recommendations, you'll need to search for them based on the context of what you're doing. It could range from sanitizing input to limiting upload size and format.
For example, an MP3 file virus probably only works on a few specific MP3 players. Not on all of them.
At any rate, if you want broad coverage from viruses, then scan the files with a virus scanner, but that probably won't protect you from things like script injection.
If your server doesn't do something inherently stupid, there should be no problem. But...
Since I'm not yet 100% sure about the internal management, there is the possibility that at some point the file could be executed/opened internally.
... this qualifies as inherently stupid. You have to make sure you don't accidently execute uploaded files (permissions on the upload directory are a starting point, limit the upload to specific directories etc.).
Aside from executing, if the server attempts any file type specific processing (e.g. make thumbnails of images) there is always the possibility that the processing can be attacked through buffer overflow exploits (these are specific for each type of software/library though).
A pure file server (e.g. FTP) that just stores/serves files is save (when there are no other holes).
Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.
Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.
Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.
I apologize if this is a really beginner question, but I have not worked with Java in several years.
In my application, I need to keep up with a list of files (most, if not all, are txt files). I need to be able to add to this list, remove file paths from the list, and eventually read the contents of the files (though not when the files are initially added to the list).
What is the best data structure to use to store this list of files? Is it standard to just save the path to the file as a String, or is there a better way?
Thanks very much.
Yes, paths are usually stored as String or File instances. The list can be stored as an ArrayList instance.
It really depends on your requirements
you can store filenames/paths using anything that implements Collection if you have a small number of files and/or a flat directory structure
if looking up files is performance critical you should use a data structure that gives you fast search, like a HashSet
if memory space is an issue (e.g. on mobile devices) and your number of files is high and/or your directory structure deep you should use a data structure that allows for compact storage, like a trie
If the data structure allows, I would store Files rather than Strings however because there is no additional overhead and File obviously offers convenient file handling methods.
One way is to use the Properties class. It has load and store methods for reading and writing to a file, but it may not match what you are doing.
I'm not sure if I understood your question completely. But I like to store Files as File Objects in Java. If you apply the same operation to each File then you can store them in a List. But maybe you have to clarify your question a little bit.
I would recommend storing a set of file objects using the Collection interface of your choice. The reason to do this is that the File Object creates a canonical reference to the file, which is device independent.
I don't think that the handle is open when you do this, but I am open to correction.
http://java.sun.com/javase/6/docs/api/java/io/File.html
Is there some library for using some sort of cursor over a file? I have to read big files, but can't afford to read them all at once into memory. I'm aware of java.nio, but I want to use a higher level API.
A little backgrond: I have a tool written in GWT that analyzes submitted xml documents and then pretty prints the xml, among other things. Currently I'm writing the pretty printed xml to a temp file (my lib would throw me an OOMException if I use plain Strings), but the temp file's size are approaching 18 megs, I can't afford to respond a GWT RPC with 18 megs :)
So I can have a widget to show only a portion of the xml (check this example), but I need to read the corresponding portion of the file.
Have you taken a look at using FileChannels (i.e., memory mapped files)? Memory mapped files allow you to manipulate large files without bringing the entire file into memory.
Here's a link to a good introduction:
http://www.developer.com/java/other/article.php/1548681
Maybe java.io.RandomAccessFile can be of use to you.
I don't understand when you ask for a "higher level API" when positioning the file pointer. It is the higher levels that may need to control the "cursor". If you want control, go lower, not higher.
I am certain that lower level Java io clases allow you to position yourself anywhere within any sized file without reading anything into memory until you want to. I know I have done it before. Try RandomAccessFile as one example.