Using a version control system as a data backend - java

I'm involved in a project that, among other things, involves storing edits and changes to a large hierarchical document (HTML-formatted text). We want to include versioning of textual changes and of structural changes.
Currently we're maintaining the tree of document sections in a relational database, but as we start working on how to manage versioning of structural changes, it's clear that we're in danger of having to write a lot of the functionality that a version control system provides.
We don't want to reinvent the wheel. Is it possible that we could use an existing version control system as the data store, at least for the document itself? Presumably we could do so by writing out new versions to the filesystem, and keeping that directory under version control (and programmatically doing commits and so forth) but it would be better if we could directly interact with the repository via code.
The VCS that we are most familiar with is Subversion, but I'm not thrilled with how Subversion represents changes to the directory structure -- it would be nice if we could see that a particular revision included moving a section from Chapter 2 to Chapter 6, rather than just seeing a new version of the tree. This sounds more like the way a system like Mercurial handles changes to the structure.
Any advice? Do VCS's have public APIs and so forth? The project is in Java (with Spring) if it matters.

Maybe you could use a JCR (JSR-170) compliant repository like Jackrabbit instead. To me, what you're describing is exactly what JCR is for. Have a look at this article.

You can certainly program SCMs via APIs. Check out SVNKit for Java and Subversion, or JGit for Java and Git. Mercurial doesn't appear to offer such an API.
Whatever you do, wrap up your implementation in a suitable API, so you can swap one SCM for another, or maybe bin the concept of an SCM at some stage in the future. It may well be a pragmatic solution to your problem, however, and worthy of more investigation.

Try http://svnkit.com/ for Subversion.

Here you have a pure Java SVN lib SVNkit it can be used by Eclipse SVN integration so it should be fairly stable.

Related

How to write custom storage plugin for apache drill

I have my data in a propriety format, None of the ones supported by Apache drill.
Are there any tutorial on how to write my own storage plugin to handle such data.
This is something that really should be in the docs but currently is not. The interface isn't too complicated, but it can be a bit much to look at one of the existing plugins and understand everything that is going on.
There are 2 major components to writing a storage plugin, exposing information to the query planner and schema management system and then actually implementing the translation from the datasource API to the drill record representation.
The Kudu plugin was added recently and is a reasonable model for a storage system with a lot of the elements Drill can take advantage of. One thing I would note is that if your storage system is not distributed and you just plan on making all remote reads you don't have to do as much work around affinities/work lists/assignments in the group scan. If I have some time soon I'll try to write up a doc on the different parts of the interface and maybe write a tutorial about one of the existing plugins.
https://github.com/apache/drill/tree/master/contrib/storage-kudu/src/main/java/org/apache/drill/exec/store/kudu

Java version control

I am using docx4j to load, manipulate and save Word files. Everything works perfectly but there is one thing i don't know how to implement it.
What I want is something like a version control - that means if you save a document it shall be possible to recover a earlier version of this document (e.g. by saving only the delta). Maybe you can describe it that is should be something like SVN or Git where you can go back to an earlier version of your files.
The problem is that i do not know any possibility to realize that. So I hope that anyone of you can help me. It would be fine if anyone at least know a package or something else that can do this with files in general and not especially with the docx Files.
Thanks for your help!
Edit: I am sorry that my question was imprecise. This was my first post here, in future i will improve ;)
JGit is a Java implementation of Git that will work with few dependencies. Similar libraries exist for SVN and CVS. Home-brewing a version control system is almost certainly a terrible idea, given the existence of good-quality solutions!
If you'd like some pure java implementatoin for document versioning, maybe you could go for: Jackrabbit
Similar questions have been already asked before. The 1.st answer (marked as correct one) on this question goes for Jackrabbit as well: Using a version control system as a data backend
I think you should use Git for this, I found a Java API called JavaGit, so you can have easy access the repository.
With Git you can have a local repository where you can commit files and switch versions. If you need it you could also push and pull the data to a remote location.
Better use JGit like Gian said!
The simplest possible way would be to use tools diff and patch. They were used as a core of CVS. I assume that you would like to run your application in Windows where they are not preinstalled. I don't know whether it would be easy / comfortable to use windows versions of this tools but you can always try to write similiar functionality on your own. Here you can find very good tutorial about finding differences between files and patching them: http://tuts.pinehead.tv/2012/09/18/introduction-using-diff-and-patch/ When you know the functionality it's quite easy to write something similiar on your own.
This can be a tricky thing to implement yourself, so I wouldn't recommend it.
I don't know much about your environment, but if you are able to use an off-the-shelf versioning repository, you'll save yourself much grief. You can try to use Git or SVN directly, which may be the simplest solution for your use case.
Since you're talking about MS Office files, however, you may be implementing some form of enterprise document management tool. In this case, the JCR specification is designed to provide access to files in a repository, with versioning and other metadata features. Here's the specification.
The Apache Jackrabbit project provides an open source implementation of this spec, as does the developers version of Alfresco.
Picking the right solution will really depend on what your users are trying to do with these files, what your deployment environment looks like (don't try to host Git on Windows, k?), and how custom your current codebase is (standard Servlet container? Java EE? home-rolled?).
Good luck!

Whats the best way to implement a simple document management system?

I am planning to build a simple document management system. Preferably built around the java platform. Are there are best practices around this? The requirements are :
Ability to upload documents
Ability to Tag documents
Version the documents
Comment on documents
There are a couple of options that I am currently considering. The first option would be a simple API on top of SVN or CVS and use a DB backend to track tags, uploader, comments etc
Another option is to use the filesystem. Version the documents as copies in a versions folder and work with filenames.
Or, if there is an Open non GPL'ed doc management system, we could customize it to our needs and package it in our application. Does anybody have any experience building something like this?
You may want to take a look at Content repository API for Java and the several implementations (some of them free).
Take a look at the many Document Oriented Database systems out there. I can't speak about MongoDB or any of the others, but my experience with Couchdb has been fantastic.
http://couchdb.apache.org/
best part of it is that you communicate with it via a REST protocol.
The best way is to reuse the efforts of others. This particular wheel has been invented quite a bit of times.
Who will use this and for what purpose?

Git or Mercurial usage in Java projects

Just wondering if any of you are using Git or Mercurial for your Java projects, or is Subversion still the most popular choice? I've been looking at github.com and bitbucket.org lately, but because the repositories might be private, I can't get a good indication of actual usage.
Be careful.
Do you remember how you felt about CVS after you used subversion?
You'll feel exactly the same way about subversion if you use git/mercurial.
Yeah, sure, you drop in for Christmas and the odd weekend,
but you can never go home again.
I don't think language should come into the equation. Both Mercurial and Git are functionally similar and conceptually very different than Subversion. It's more important that you select the right flavour of version control for the way you want to work.
As it happens, I use Mercurial with Java. I use Netbeans as my IDE which has built in support for both Mercurial and Subversion - both work well. I can highly recommend Bitbucket too.
After being fed up by the useless Subversion ignore filters my company moved one project to Mercurial. A couple of months later we'd moved all our code (mostly Java) over to Mercurial repos. Every second weekend we're now dancing naked around burning .svn folders chanting and screaming. Branching and merging is so much smoother than with Subversion. And it's really nice to work against a lightning fast local repo.
I suspect that there are more projects out there using Subversion than Git and Mercurial, but the trend seems to be going towards distributed version control systems.
Github also sells private hosting and it works really well. To view the available packages, go to your account page and click on Change Plan on the top right (In the Your Plan box). Github is often times the reason people try out and eventually stick with Git.
If you would like to see how Git compares to other version control systems, there is no better suited site than this one: Why Git is Better than X (Coincidentally, that site's source is on GitHub; there's a link to it at the bottom of the page).
I don't think that the language you are using for your project matters much. I've recently switched to Git and I'm still pretty new to it, but it really seems to make a lot of sense. I'm really liking the whole idea that every repository is a clone of each other, instead of having one central repository which would not allow you to work while offline (As is the case with Subversion). Despite the fact that most people nowadays always have internet access, it's also nice to know that there really isn't a single point of failure (Unless there is a single point to begin with, which there wouldn't be if you had hosting at Github for example, or as long as you had two repositories).
In the end I don't think it should be about choosing what is most popular (Which recently seems to be Git) but what works for you. I think most developers are beginning to move to a Distributed Version Control System like Git or Mercurial, and it seems like more are joining the Git camp, which will most likely mean that you will find more guides, tools, etc. for it, especially as more and more people continue to switch to it.
The crucial point for us is IDE support which needs to be rock solid. We are a small shop and do not have resources to deal with little annoyances (which is also why we are still on CVS, as we need as good IDE support as CVS before switching).
I believe others feel the same. The source repository is so crucial that any migration must be painless AND give benefits.
Personally, I believe the Sun endorsement of Mercurial will benefit it enormeously, perhaps into being a new defacto standard
Subversion is the best solution when it comes to solid IDE integration as every Java IDE supports it and Mercurial should be well suited for Netbeans users as the dev team of NB uses it as its source control.
Git just doesn't offer any advantage over Mercurial, only lack of Windows support and less tool support in general.

Providing a common interface to SVN and CVS

SVN and CVS are two very different systems that aim to address the same basic issue - source control.
They both work in distinct ways, so it's probably difficult to deal with them exactly the same.
What I'm wondering is, would it be conceivable to develop a programming library or API that exposes the same interface, but under the hood, can be set up to work with either an SVN or CVS repository?
I'm aiming to develop such a library in either .NET or Java (most likely .NET), but I wanted to get some thoughts on how feasible this would be, and if there's a better way to approach the problem.
The context of this is: I'm building a web-based service that will involve source control hosting, and I want to support both SVN and CVS so as to serve the largest amount of developers possible.
Personally I would ignore CVS for a new product. My feeling would be that the enormous extra effort to coerce it into looking like SVN would be better spent on other other stuff. I don't know your market, so I might be wrong, but that's got to be worth thinking about.
The MSSCCI API does something very similar:
http://alinconstantin.homeip.net/webdocs/scc/msscci.htm
The MSSCCI tries to make all source controls look the same from the perspective of the IDE.
viewvc lets you browse svn and cvs repositories. maybe there is an existing product which will already do what you want?

Categories

Resources