We have two ways of keeping files - database or filesystem.
So I have EJB, that can be connected to as from servlet as from java application.
The problem is with working with binary files. I found here that it's a restriction for EJB (NOT FOR OTHER JAVA EE CONTAINERS) to write/read files to/from filesystem. They say keep in database.
Ok. I said. But here and here everybody says that's a very bad idea. And I totally agree with it.
Another solution is JCR in Apache Jackrabbit implementation. But I consider it as both of ways. Here it says it keeps files also in filesystem.
So what is the best practice? Why can't we write files to filesystem without JCR?
You could use a JCA connector to access the filesystem from your EJBs. Thus the filesystem would be a resource http://code.google.com/p/txfs/ for a basic filesystem access.
XADisk for a more complete and transactional access.
Nuxeo doesn't use JCR anymore and you have also Modeshape ( JBoss/ RedHat implementation) on top of Infinispan as a data grid.
Keeping the binary files on the file system will have the least overall performance impact and that's the most appropriate way to go (applications managing tons of binary content like Alfresco use the file system and that works fine for them). If you need your files to be accessed from multiple servers you could store them on a single server and share the directory on the network or just use NAS. However working with the file system is not transactional but if that is critical you may use tools like XADisk that support transactions on a file system level.
JCR is also a good way to go but it is not meant to store only binary files, it's meant to store a "content" (binary data + metadata describing the binary data) in a hierarchical way. However using JCR you are adding another level of abstraction and that may have a performance impact. As the most popular JCR implementations like Apache Jackrabbit, Alfresco and Nuxeo use the file system behind the scene you should consider if you really need the additional features that JCR provides like metadata, versioning, thumbnails, document preview, etc. but with a performance penalty and integration cost or you can just go with the file system if it satisfy your needs.
Related
Our team works with a well known OSGI based COTS product that runs as standalone service (it does not interact with multiple instances of itself). The product contains an API which allows developers to build additional functionality into the project. This product stores what can be large sized jars (1-5M) in zookeeper along with other configuration data. The COTS product also includes much opensource (tomcat, zookeeper, many other apache products, etc.). Thanks to the product being written in java, I have a good understanding of the design and source code.
Our instance of the product has been having issues starting up correctly at times and the issue according to the vendor is that the product is either failing to correctly write or read to zookeeper either when the product is stopping or started (Vendor does not yet know for sure). This problem only started to appear as we started to add these large jars to the products ./deploy folder.
I do not believe that the node or path cache use cases apply to this product https://github.com/Netflix/curator/wiki/Recipes
Full disclosure: I currently only have a shallow understanding of zookeeper and have been trying without success to find a recipe/use case where one would use zookeeper to store large binary jars. I also recognize that I may be asking the wrong question to this audience.
Is the above scenario a common use case for zookeeper?
ZooKeeper is a consensus store that allows multiple processes to share a common view of a shared resource, it is not a blob store, and should not be used as one.
Firstly, ZooKeeper is a poor choice for storing data in a standalone instance. If you have no need for distributed consensus between multiple readers/writers then ZooKeeper is complete overkill.
Secondly, ZooKeeper nodes are designed to hold small data which changes frequently, potentially with many readers watching for changes - the JAR files that you are adding seem not to fit this pattern in that there aren't many readers (the product is a standalone instance) and the JAR files are large.
The default ZooKeeper configuration puts a hard limit of 1MB storage per ZNode, and ideally you store a lot less than that. This can be increased, but it is not advised that you do so. I would strongly recommend that you look into using a proper file store (or even just the file system as your node is standalone) to store these JAR files.
In my task I need to save file on disk and update info about in database.
Exception can happen when saving file and when updating info in database.
Do exist some ready open source solutions for it or it need write from scratch?
Thanks.
There's XADisk which provides transactional access to file systems. From their web site:
XADisk (pronounced 'x-a-disk') enables transactional access to
existing file systems by providing APIs to perform file/directory
operations. With simple steps, it can be deployed over any JVM and can
then start serving all kinds of Java/JavaEE application running
anywhere.
In Java, enterprise transactions management is ruled by the JTA spec wich is used in Java EE.
JTA allows to create several TransactionManager with differents implementations (one for database, one for file) and make them work together to define a cross-transaction
I think this could be a way for you to do what you want
Outside of a container, there are possibility to integrate JTA. You should have a look at Spring or JBoss implementations
Look at this blog post for more information about Spring and transactions usage
The file system isn't directly supported by Java EE, but you could implement/ search for a resource adapter for the file system.
http://docs.oracle.com/javaee/6/tutorial/doc/gipgl.html
http://docs.oracle.com/javaee/6/tutorial/doc/giqjk.html
In our project we use jackrabbit with spring and tomcat to manage pdf files.
Currently MySql database is being used to store blob files (in terms of jackrabbit it's called BundleDbPersistenceManager).
As soon as the number of generated files grow we thought of using file system instead of database to boost performance and to eliminate replication overhead.
In the spec jackrabbit team recommend using BundleFsPersistenceManager instead but with comments like this
Not meant to be used in production environments (except for read-only uses)
Does anyone have any experience using BundleFsPersistenceManager and can reference any resources on painless migration from blobs in mysql database to files in the filesystem?
Thank you very much in advance
Persistence in Jackrabbit is a bit complicated, it makes sense to read the configuration overview documentation first.
In Jackrabbit, binaries are stored in the data store by default, and not in the persistence manager. Even if you use the BundleDbPersistenceManager, large binary files are stored in the data store. You can combine the (default) FileDataStore with the BundleDbPersistenceManager.
I would recommended to not use the BundleFsPersistenceManager, because data can get corrupt quite easily if the program gets killed while writing.
We have a utility spring-mvc application that doesn't use a database, it is just a soap/rest wrapper. We would like to store an arbitrary message for display to users that persists between deployments. The application must be able to both read and write this data. Are there any best practices for this?
Multiple options.
Write something to the file system - Great for persistence. A little slow. Primary drawback is that it would probably have to be a shared file system, as any type of clustering wouldn't deal well with this. Then you get into file locking issues. Very easy implementation
Embedded DB - Similar benefits and pitfalls as just writing to the file system, but probably deals better with locking/transactional issues. Somewhat more difficult implementation.
Distributed Cache - Like Memcached - A bit faster than file, though not much. Deals with the clustering and locking issues. However, it's not persistent. Fairly reliable for a short webapp restart, but definitely not 100%. More difficult implementation, plus you need another server.
Why not use an embedded database? Options are:
H2
HSQL
Derby
Just include the jar file in the webapps classdir and configure the JDBC URL as normal.
Perfect for demos and easy to substitute when you want to switch to a bigger database server
I would simple store that in a file on a filesystem. It's possible to use an embedded database, or something like that, but for 1 message, a file will be fine.
I'd recommend you store the file outside of the application directory.
It might be alongside (next to) it, but don't go storing it inside your "webapps/" directory, or anything like that.
You'll probably also need to manage concurrency. A global (static) read/write lock should do fine.
I would use JNDI. Why over-complicate?
I am developing a J2EE application that manages hundreds of jars (saved and loaded on the fly)
to manage them i have two options:
create a directory on the server
that contains all the jars
save the jar as a LOB in an oracle
10g database
Could you help me to choose the best solution?What are the benefits of each option?
thanks for your help.
If there are many jars you may use Maven like file organization divided by folders. It's always better to store files in a file system optimized for huge files I/O and keeping their paths/URLs in your database of choice.
What is the security surrounding these files ?
Objects in a database may be easier to secure, and audit access to.
Also, have a look at SecureFiles / DBFS and deduplication
Basically, if you store a lot of copies of the same file, the database will realize that and only have one physical copy. The equivalent of symbolic links.
Write performance for SecureFiles is better than a regular filesystem
With DBFS you can also mount the database as a filesystem. That can make it very easy to switch between a 'regular' filesystem storage and database storage.
involving a database LOB is going to be a lot slower under almost all circumstances, even if the DB is clustered. Developing a directory and file naming convention isn't going to be that difficult. If you want, you can use the DB to help keep track of the files so you don't have to search the filesystem itself.