Whats the best way to implement a simple document management system? - java

I am planning to build a simple document management system. Preferably built around the java platform. Are there are best practices around this? The requirements are :
Ability to upload documents
Ability to Tag documents
Version the documents
Comment on documents
There are a couple of options that I am currently considering. The first option would be a simple API on top of SVN or CVS and use a DB backend to track tags, uploader, comments etc
Another option is to use the filesystem. Version the documents as copies in a versions folder and work with filenames.
Or, if there is an Open non GPL'ed doc management system, we could customize it to our needs and package it in our application. Does anybody have any experience building something like this?

You may want to take a look at Content repository API for Java and the several implementations (some of them free).

Take a look at the many Document Oriented Database systems out there. I can't speak about MongoDB or any of the others, but my experience with Couchdb has been fantastic.
http://couchdb.apache.org/
best part of it is that you communicate with it via a REST protocol.

The best way is to reuse the efforts of others. This particular wheel has been invented quite a bit of times.
Who will use this and for what purpose?

Related

Using Java Spring with Headless/Decoupled CMS

Currently I am involved in different projects using mainly Java Spring and PHP Laravel. In my work place (ASP.NET based technologies) it is common to use this approach of retrieving the content from the CMS and I could see all the beneficts that it provides so I would like to make something similar but in Java Spring (optionally if it works for Laravel, that would be awesome, but it is just optional).
What I desire:
A CMS based webpage just to upload the content for different projects in a well organised way.
The content is going to be mostly "Strings", HTML and pictures.
From different projects, I am able to connect to the CMS and retrieve this content in the views.
If it is possible, open source or free solution.
I already searched for different options, but I was not able to find any good solution or some tutorial of how to actualy do it and not just the ideas...
Thank you so much in advance.
Best regards.
Jose Lara.
Take a look at Spring Content. It was designed to be paired with Spring Data to allow you to build bespoke headless CMS very quickly and easily. Moreover, unlike most other CMS's on the market you can choose what component to build it with so you can choose newer cloud-native databases and storage.

Java version control

I am using docx4j to load, manipulate and save Word files. Everything works perfectly but there is one thing i don't know how to implement it.
What I want is something like a version control - that means if you save a document it shall be possible to recover a earlier version of this document (e.g. by saving only the delta). Maybe you can describe it that is should be something like SVN or Git where you can go back to an earlier version of your files.
The problem is that i do not know any possibility to realize that. So I hope that anyone of you can help me. It would be fine if anyone at least know a package or something else that can do this with files in general and not especially with the docx Files.
Thanks for your help!
Edit: I am sorry that my question was imprecise. This was my first post here, in future i will improve ;)
JGit is a Java implementation of Git that will work with few dependencies. Similar libraries exist for SVN and CVS. Home-brewing a version control system is almost certainly a terrible idea, given the existence of good-quality solutions!
If you'd like some pure java implementatoin for document versioning, maybe you could go for: Jackrabbit
Similar questions have been already asked before. The 1.st answer (marked as correct one) on this question goes for Jackrabbit as well: Using a version control system as a data backend
I think you should use Git for this, I found a Java API called JavaGit, so you can have easy access the repository.
With Git you can have a local repository where you can commit files and switch versions. If you need it you could also push and pull the data to a remote location.
Better use JGit like Gian said!
The simplest possible way would be to use tools diff and patch. They were used as a core of CVS. I assume that you would like to run your application in Windows where they are not preinstalled. I don't know whether it would be easy / comfortable to use windows versions of this tools but you can always try to write similiar functionality on your own. Here you can find very good tutorial about finding differences between files and patching them: http://tuts.pinehead.tv/2012/09/18/introduction-using-diff-and-patch/ When you know the functionality it's quite easy to write something similiar on your own.
This can be a tricky thing to implement yourself, so I wouldn't recommend it.
I don't know much about your environment, but if you are able to use an off-the-shelf versioning repository, you'll save yourself much grief. You can try to use Git or SVN directly, which may be the simplest solution for your use case.
Since you're talking about MS Office files, however, you may be implementing some form of enterprise document management tool. In this case, the JCR specification is designed to provide access to files in a repository, with versioning and other metadata features. Here's the specification.
The Apache Jackrabbit project provides an open source implementation of this spec, as does the developers version of Alfresco.
Picking the right solution will really depend on what your users are trying to do with these files, what your deployment environment looks like (don't try to host Git on Windows, k?), and how custom your current codebase is (standard Servlet container? Java EE? home-rolled?).
Good luck!

Process XML data with Java

I am software written in Java which read an external XML file (let's call it "datasource.xml").
This file contains different information and this information are extracted using XPath queries.
The fact is that, according to what kind of information is extracted from that file (datasource.xml) a different work flow is needed. At the moment workflows are "hard coded" in my Java classes but I want to make my software indipedent so that it can work with any datasource.xml, no matter of its structure. But of course I have to specify somewhere how to deal with the extracted data. I was thinking to use (again) JAXB and specify inside the XML file (and from its XSD I will create JAXB classes) the kind of workflow is needed.
Could it be a good solution??
Thanks
have you checked out Drools (a project from JBoss) very easy to learn & is an excellent workflow tool.
building your own workflow engine is quite complex & there are a lot of considerations to be taken into account.
You can think of using activiti, Another workflow solution. It has APIs available and can be used as a workflow service layer in your application.
Like others, I think you will be better off using a higher-level tool for this rather than hand-coding the logic in Java. Take a look at XProc (for example the Calabash implementation), or Orbeon, or Cocoon. They all have a learning curve associated with them, but once mastered, you will have a much more flexible architecture than with hard-coded Java logic.

implementing simple Document management

My qustion is: How would you go on implementing simple DMS(document management) based on following requirements?
DMS shouls be distributed web application.
Support for document versioning.
Support for document locking.
Document search.
Im already clear on what technologies I want to use. I will use Sring MVC, Hibernate and relational (most likely MYSQL) database.
One thing Im not very clear on is if I need to use webdav, since I could just upload or download documets. I thing I have to because I need to acomplish point 2. and especially point 3. somehow. Is this the right way to go?
Any examples or experience with this would come very handy :). May be Milton is not the best library to pick for webdav?
#Eduard, regarding dependencies on 3rd parties - are you doing this as a college/university exercise or something that will affect real users in a production environment?
At the risk of sounding very pretentious; don't reimplement the wheel! I'd definitely 2nd the call to use JCR, this way you are depending a standard and not a 3rd party implementation.
JCR is a well defined standard (that means a lot of people invested commercial effort (i.e. cash and expertise in huge amounts) into this). I would seriously reconsider looking into JCR - think of it as an API where 3rd parties provide the implementation (no vendor lockin).
Have a look at the features you'll get out-of-the-box, I believe 99 - 110% of the functionality you require is available through a JCR implementation. Plus you'll benefit from the fact the code you'll be using has been tested by hundreds of people in real world situations.
Where I'd differ from bmscomp is in suggesting JackRabbit http://jackrabbit.apache.org/
Option 1:
I am not sure about webdav, no real experience on it. But I would highly recommend you using a Document database like MongoDB.
With mongodb, you can:
1. Handle document versions
2. MongoDB has atomic operations, you can add your logic of document locking.
This will give you some awesome added benefits of search your documents store.
Option 2:
Apache Jackrabbit: A Content repository
A content repository is a hierarchical
content store with support for
structured and unstructured content,
full text search, versioning,
transactions, observation, and more.
Think about using JCR Java content Repository
http://en.wikipedia.org/wiki/Content_repository_API_for_Java or you can have a look at the job done on Alfresco or and Exo framework they did a good job
You can use these open source projects to meet your requirements:
http://sourceforge.net/projects/logicaldoc/ -
LogicalDOC is a modern document management system with a nice interface, easy to use and very fast. It uses open source Java technologies such as GWT, Spring, Lucene in order to provide a flexible and scalable DMS platform. http://www.logicaldoc.com
http://sourceforge.net/projects/openkm/ -
OpenKM Document Management - DMS Updated 2011-05-25
OpenKM is powerful scalable Document Management System (DMS). OpenKM uses Jboss + J2EE + Ajax web (GWT) + Jackrabbit (lucene) Open Source technologies. http://www.openkm.com/
Spring MVC is a good choice. If you want to use a relational database then can also check out Datanucleus. At least the JDO layer (plus maybe the JPA layer) provides versioning support. For search I recommend apache solr, based on lucene, wich has excellent and powerful fulltext search capabilites.
Although webdav seems like the natural choice as a simple and cross plattform file transfer protocol I never had good experiences. Either the Client or the Server didn't work well (konqueror, internet explorer, zope 2, ...). So abstract from the protocol and provide multiple ways to access the file.

Extend JackRabbit or build up from Lucene?

I've been working on a site idea the general concept is a full text search of documents that also allows user ratings based on these rating I wanted to boost the item's value in the Lucene index. But I'm trying to find if I should extend JackRabbit or just build from the Lucene base. Is there any good way to extend JackRabbit in this way and effect the index or would it be best to work directly off Lucene?
Either way I go I am strongly leaning to using groovy on grails with either the searchable plugin or work directly with JackRabbit is there any major reasons I should just stick to Java?
Clarification:
I would like to boost an item based on the average user rating of an item, is JackRabbit open enough or expandable enough where I can capture user ratings then have those effect the index within JackRabbit or is it so far out of the core of JackRabbit I should just build up from Lucene?
I recommend using JCR, with the implementation of Jackrabbit behind it. JCR allows you to separate between what you store and how you store it.
By staying within a JCR framework, you should be able to easily switch among JCR implementations. (There are several, not just Apache's.) Even within Jackrabbit are many persistence managers, not just Lucene. This flexibility is useful when you want to trade off between storage space and performance.
JCR already includes full text searches and the ability to maintain user ratings. It should be a good fit for your project.
is there any major reasons I should just stick to Java?
Not really. As you probably already know, you can use any Java library with Groovy/Grails, so there's nothing you can do in Java that you can't do in Groovy. Although the contrary is also true, in my experience, it takes a lot more (boilerplate) code to get things done in Java.
Although Java is considerable faster than Groovy, this doesn't necessarily mean your app will be faster if written in Java, as the bottleneck could likely be the database rather than code execution.
As for whether you should use Lucene/Searchable or JackRabbit, it's very difficult to say without knowing much about what you can achieve. All you've told us so far is that you want to index documents and boost certain items in the index. You can certainly do both of those with Lucene.
I would recommend using JCR/Jackrabbit on top of Lucene for a couple of reasons:
1) Your repository structure could readily support document nodes with child nodes that store all of your meta-data including owner, ratings, flagging, comments, etc.
2) JCR is ideal for document/node based app development, providing a lot of the heavy lifting at the framework level while not getting in your way at the app level.
I would recommend you to use Apache Sling, it comes with Jackrabbit/Lucene built-in.
Most of the committers are also involved with Jackrabbit, so it's designed to work well with it -- even better, it's designed to run on top of it.
One of the nice features of Sling is that it mounts the entire JCR repository in the URL space and exposes it via REST endpoints.
So you can access your documents/metadata very easily by doing a simple HTTP request to it. It also allows you to write your own servlets and expose them as REST endpoints. (This is extremely easy -- no fiddling about with applicationContext.xml files, just 1 annotation)
It also allows you to write jsp, esp, groovy, ...

Categories

Resources