For the sake of brevity consider a facebook style image content serving app. Users can upload content as well as access content shared by other people. I am looking at best ways of handling this kind of file serving application through Java servlets. There is surprisingly little information available on the topic. I'd appreciate if someone can tell me their personal experiences on a small setup (a few hundred users).
So far I am tempted to use the database as a file system (using mongodb) but the approach seems cumbersome and tedious and will need replicating part of the functionality already provided by OS native filesystems. I don't want to use commercial software or have the bandwidth to write my own like facebook. All I want is to be able to do this through free software on a small server with a RAID or something similar. A solution that scales well to multiple servers would be a plus. The important thing is to serve it through java servlets (I am willing to look into alternatives but they have to be usable through java).
I'd appreciate any help. Any references to first hand experiences would be helpful as well. Thanks.
Guru -
I set up something exactly like this for members of my extended family to share photos. It is a slightly complicated process that includes the following:
1) Sign up for Amazon Web Services, notably their S3 (Simple Storage Service). There is a free storage tier that should cover the amount of users you described.
2) Set up a web application that accepts uploads. I use Uploadify in combination with jQuery and ajax, to upload to a servlet that accepts, scans, logs, and does whatever else I want with the file(s). On the servlet side, I use ESAPI's upload validation mechanism, part of the validation engine, which is just built on top of Commons File Upload, which I have also used by itself.
3) After processing the file(s) appropriately, I use JetS3t as my Java-AmazonS3 API and upload the file to Amazon S3. At that point, users can download or view photos depending on their level of access. The easiest way I have found to do this is to use JetS3t in combination with the Web Application Authentication to create Temporary URL's, which give the user access to the file for a specific amount of time, after which the URL becomes unusable.
A couple of things, if you are not concerned with file processing and trust the people uploading their files completely, you can upload directly to Amazon S3. However, I find it much easier to just upload to my server and do all of my processing, checking, and logging before taking the final step and putting the file on Amazon S3.
If you have any questions on the specifics of any of this, just let me know.
While Owens suggestion is an excellent one, there is another option you can consider - what you are describing is a Content Repository.
Since you have sufficient control of the server to be able to install a (non-commercial) piece of software, you may be interested in the Apache Jackrabbit* Content Repository. It even includes a Java API, so you should be able to control the software (at least as far as adding, and extracting content) from your Servlets.
Actually, if you combine this idea with Owens and expand on it, you could actually host the repository on the Amazon S3 space, and use the free-tier Amazon EC2 instance to host the software itself. (Although, I understand that the free-tier EC2 instance is only free for the first year)
HTH
NB. I'm sure other content repositories exist, but JackRabbit is the only one I've played with (albeit briefly).
Related
It is kind of basic thing to store file as it seems, but I stumbled upon certain problem. Here is what I understood about storing files on server while programming in JAVA.
You can't store in mysql, because if the file is heavy, it is not recommended. So, I decided not to opt this.
It is also not recommended to store files in containers, such as Tomcat or Wildfly, may be because of the fact that it needs to get deployed and something like that??
You can definitely store file, Apache File server? I am confused in this. Can we do this, and store files in here, reference it to the database? Is this the similar way websites store their images or files on?
I also came across some databases such as NoSql, but I didn't go much into depth, thinking it might be wrong at the end, and I would have invested my time in other stuff.
Saying so, what is the good way to store file in the server, reference it on JAVA web application, and record it in database?
We don't usually put our files in the database (NoSQL or RDBMS) because they are not file systems. If someone uploads a file, you store it in a file system and probably record the name and other metadata in the db for future use. You can technically put the contents of a file in a database and it has its own merits and drawbacks - https://softwareengineering.stackexchange.com/a/150787/156860
Uploaded files are not recommended to reside in web/app servers because you might have more than one server to handle the load and if you put your file in just one server, the other server might face problems in accessing the content. And if one server goes down, the files in that server are not available for other servers to use. You'd probably need a shared drive/disk which all the servers can connect to and read the file.
JavaEE handles this by providing a Resource Adapter which abstracts how the application interacts with 'resources' in general (could be a file or some other resource as well). But without having more detail on what kind of files, how big, and how are they being used, it'll be hard to get to 'the' solution for the question.
It really depends on what your use-case is. If you give more insight into what you're trying to solve for you should get better responses.
I think generally speaking that cloud solutions like Amazon's S3 are very common, good, and easy to work with these days. You also get the benefit of your servers not necessarily having to serve the files themselves which means more bandwidth for non-file based requests.
You can't store in mysql, because if the file is heavy, it is not recommended. So, I decided not to opt this.
It really depends on what you're doing, if you have high traffic and large files then this isn't going to scale well. If you have low traffic and small files, then there's no reason this isn't a viable solution.
It is also not recommended to store files in containers, such as Tomcat or Wildfly, may be because of the fact that it needs to get deployed and something like that??
Not quite sure what you mean by "in" containers, do you mean in your war file? Or on the local file system? It's generally not recommended but again, it really depends on traffic, file sizes, etc.. The local file system can work, but you should use something like NFS so if the server goes down the files don't go with it.
I've seen people use the local file system and git to store text files that are saved through the server. This gave them redundancy and versions.
You can definitely store file, Apache File server? I am confused in this. Can we do this, and store files in here, reference it to the database? Is this the similar way websites store their images or files on?
I've never personally used Apache File Server, but what's confusing you about it? After doing a simple Google it looks like it's basically just an Apache server, which means you make HTTP requests to store and retrieve files. This would be similar to how S3 works and so I'd say it's probably a good general solution.
I also came across some databases such as NoSql, but I didn't go much into depth, thinking it might be wrong at the end, and I would have invested my time in other stuff.
The actual NoSql solution you choose will have it's own requirements so I can't speak to how good the solution will be, but again, it really depends on your traffic and file sizes. The NoSql solutions I've used all have HTTP interfaces so working with them won't be all that different than Apache or S3. The biggest difference is that you have to query for the files like you would from a Database in a NoSql solution, but those queries are usually sent over HTTP so, not really that different.
I have a Document Management System that stores documents in a database. I'm looking for a simple way (not too much and complicated protocol to implement) to show the database as a drive in windows (so it can be browsed and manipulated using any windows program, like explorer or office).
I have something in mind that I provide some kind of network share and that can be mounted as a drive in windows. Unfortunately all candidate network protocols for file sharing seem to require substantial effort to implement.
I first considered CIFS, but after reading up on that quickly decided that its BY FAR to complicated for me to implement. Next thougt was NFS, but its not supported by Windows (XP) natively and also seems quite complicated to implement.
FTP might be an option, but implementing an FTP server is again much more complicated than I naively expected.
There might be a simpler protocol to use I haven't thought of.
Is there anything I can (ab)use easily for this purpose?
Ideally I want some kind of (pure Java) premade server where I could easly strip out the part that accesses a local file system and replace it with my own code accessing the database OR a protocol simple enough that I can implement it myself reasonably quickly and more importantly, compatible and reliable.
First, you need to make the correct bindings between your DBS and the database, and define/write an API in your program that will describe an easy access to the needed resources. Writing an API Will allow to maximize inter-operablity of your solution with other services or plugins you might want to add later.
After that, you should serve this services to clients through a permissive and robust protocol such as WebDav which stands for Web-based Distributed Authoring and Versioning. He is natively supported by Windows, you can this way interact with every services implementing WebDav (Windows, most web browsers ...), and you can of course also mount this kind of services as a virtual drive. Plus, it is also supported on Linux and MacOS X, I believe nativ'ely but i'm not sure. In fact, WebDav is an extension to HTTP, and is described in the RFC 4918.
Basically every HTTP (which handles server side response management) library for Java could be used to implement WebDav, if you havé some time and want to do it yourself.
To implement WebDav in a way and with an acceptable effort, I searched for some Java librairies on the web and found these ones, it's now up to you to decide which one really fits your needs :
Milton http://milton.ettrema.com/index.html
A list of some of the existing implementations of the WebDav protocol into open source projects on WebDAV.org (you can find there some pretty amazing projects such as The Jakarta Slide project although I think it is not supported anymore and other projects/librairies that shows the importance of WebDav today). http://www.webdav.org/projects/
I hope many people already know about the Dropbox Cloud Service for storing and syncing files across various clients. I am little hesitant using a third party service to store my personal files.
I am trying to build a personal cloud storage using my 2TB Hard Drive. I guess I am looking for pointers on where to start, which APIs to use [preferably JAVA and J2EE], security, risks?
First, I would highly recommend getting over your paranoia; chances are extremely slim that Dropbox employees are going to spend their lunch hour looking at your photos or whatever. Literally millions of people store their stuff on Dropbox and nobody's had their privacy violated.
That said, Dropbox is based on Amazon S3, which (since I assume you trust Amazon just as much as you trust Dropbox) has an open source clone that you can run yourself. Take a look at Eucalyptus. Since you specifically brought up Java, I'll point out that the excellent AWS SDK for Java works just as well on Eucalyptus' S3 stores as on Amazon's.
How is it Cloud storage if it's on your 2TB hard drive? Why do you think your own service will be more secure when accessible through the internet then a service run by people who specialized in doing it. I guess Amazon has probably more experience than anybody else in terms of HTTP accessible file storage (S3).
If you want to make sure nobody can look at your stuff I suggest you look into encrypting it before saving it into S3. That would make it harder to access on the client side, because you always need the right tools to do the encryption stuff but that is the price you'd have to pay.
I'm looking for a document repository that supports physical data separation. I was looking into Open Text's LiveLink, but that price range is out of my league. I just started looking into DSpace! Its free and open source and works with PostgreSQL.
I'm really trying to figure out if DSpace supports physically separating the data. I guess I would create multiple DSpace Communities, and I want the documents for each Community store on different mounts. Is this possible? Or is there a better way?
Also, I wont be using DSpace's front end. The User will not even know DSpace is storing the docs. I will build my own front-end and use Java to talk to DSpace. Does anyone know a good Java API Wrappers for DSpace.
This answer is a pretty long time after you asked the question, so not sure if it's of any use to you.
DSpace currently does not support storing different parts of the data on different partitions, although your asset store can be located on its own partition separate from the database and the application. You may get want you need from the new DuraSpace application which deals with synchronising your data to the cloud.
In terms of Java APIs, DSpace supports SWORD 1.3 which is a deposit-only protocol, and a lightweight implementation of WebDAV with SOAP. Other than that, it's somewhat lacking. If you are looking for a real back-end repository with good web services you might look at Fedora which is a pure back end, with no native UI, and probably more suited to your needs. DSpace and Fedora are both part of the DuraSpace organisation so you can probably benefit from that also. I'm not knowledgeable enough about the way that Fedora stores data to say whether you can physically separate the storage, though.
Hope that helps.
What is the best way to implement a big (1GB or more) file uploader website in PHP or Java? Using the default way of uploading in PHP or Java results in running out of RAM space and slowing the website very dramatically.
It would be unwise to open the file on the client side, read its whole content to memory, close it and then start sending the contents, precisely because the contents can exceed the available memory.
One alternative is to open the file, read a chunk of it (remembering where the last chunk ended of course), close the file, upload to the server, and reassemble the file on the server side by appending to previous chunks. This is not a trivial procedure, and should take into consideration things like resource management, IO faults and synchronization, especially when working in parallel with multiple threads.
We've been using http://www.javaatwork.com/ftp-java-upload-applet/details.html for uploading very big files to dedicated hosting. It works a treat even with lots of RAW (photo) files.
Only drawback is it's not multi-threading and locks your browser until it's all uploaded.
Still to find another Java uploader as good looking as this (important to us), but there are a few multi-threaded ones out there that look pretty bad :-)
I would recommend JumpLoader [google it] , as it offers a lot of useful features. I have integrated it into my opensource CMS project, works just fine (of course, few tunings here and there is needed). Has Javascript interface which you can access with raw Jscript or JQuery [I used latter, coded little plugin for it]. The only drawback would be JumpLoader on applet's forehead :P , which you can have removed for 100 bucks.
Overall, features like multiple uploading, image and document editing in pre-upload, partitioned uploads, transmission integrity check via md5 fingerprinting blah blah blah, are very attractive.