Process XML data with Java

Process XML data with Java - java

I am software written in Java which read an external XML file (let's call it "datasource.xml").
This file contains different information and this information are extracted using XPath queries.
The fact is that, according to what kind of information is extracted from that file (datasource.xml) a different work flow is needed. At the moment workflows are "hard coded" in my Java classes but I want to make my software indipedent so that it can work with any datasource.xml, no matter of its structure. But of course I have to specify somewhere how to deal with the extracted data. I was thinking to use (again) JAXB and specify inside the XML file (and from its XSD I will create JAXB classes) the kind of workflow is needed.
Could it be a good solution??
Thanks

have you checked out Drools (a project from JBoss) very easy to learn & is an excellent workflow tool.
building your own workflow engine is quite complex & there are a lot of considerations to be taken into account.

You can think of using activiti, Another workflow solution. It has APIs available and can be used as a workflow service layer in your application.

Like others, I think you will be better off using a higher-level tool for this rather than hand-coding the logic in Java. Take a look at XProc (for example the Calabash implementation), or Orbeon, or Cocoon. They all have a learning curve associated with them, but once mastered, you will have a much more flexible architecture than with hard-coded Java logic.

Related

i18n in build process (or compiling one template HTML to i18n HTMLs)

i'm working on a project which needs to support internationalization.
the solution we thought of is:
create HTML templates with placeholders for language (i.e. home.html).
create an i18n directory with files such as: "language_en_GB.json".
on the build process have them merged together to create an output HTML. the output file will sit on a language based directory (such as "views/en_GB/home.html" or "views/fr_CA/home.html").
so basically this:
<h1>{{i18n_welcome}}</h1>
<h2>{{userName}}</h2>
merged with this:
{
welcome:"Welcome!"
}
will become this during a build proccess:
<h1>Welcome!</h1>
<h1>{{userName}}</h1>
i have a few question and appriciate your input.
is this a good approach for i18n?
do you know of a templating engine that does that i18n process well?
is there a solution for client side "baking". i would like a UI developer to be able to bake localy as well.

There are several frameworks that support i18n out of the box depending on your needs and what you are currently using in your code. As a pure templating engine, you can take a look at Velocity or Freemarker. For a more complete framework, you can look at Spring and Spring example and Struts and Struts2 example.
There are, of course, numerous other options as well. I'm just listing four of the most popular that I've seen people use.
Basically, for any of the frameworks, you create resource bundles for each language (named using the language for the specific bundle. Ex: language_en_GB.properties). So your thought process is pretty much in line. Basically you start with your html file and include your placeholder. In your resource bundle for each language, you specify what the string is supposed to be. After that, the framework does the merging on the fly for you, using the appropriate resource bundle for the language in question.
So you're pretty much on the right track - it all becomes a question of integrating properly with your framework and leveraging it to do the merging instead of doing it during your build pipeline.

You failed to provide the necessary details, so I can't really answer your question. I can only say that what you plan seems to be another wheel re-invention (but not as round as original one).
There are certain i18n best practices. In Java world it usually mean using Resource Bundles (in form of properties files) and some mechanism like JSTL to translate them when the page is being rendered. This is the best approach, as you won't need to re-compile anything to introduce the support for another language.
If you care about providing support for client-side scripts, it is usually done by writing out some array from the web page and accessing it on the client side. I think this is the most common solution. Another would be having some web service to provide you with translations and read it via XHR (i.e. AJAX), but that may be problematic. Anyway, you need to push the translations from the server side to the client side somehow.
And of course you need to read them from resource bundles.
From what you wrote it seems that you want to build some kind of static web page, backed by the application server (thus static web pages compilation). If I guessed correctly, honestly using Java for it is a bit overkill. You'd better go with some CMS software like Joomla, Drupal or jEase.

Java Messenger : save message archives on the computer

I am doing a Java Messenger for people to chat and I an looking for a way to record the message archives on the user's computer.
I have 2 possibilities in my mind :
To Save the conversations in XML files that I store in my documents folder.
To use SQlite, but the problem is that I don't know how it is possible to integrate it to my setup package and I don't know if it is very useful.
What would be the best solution for you ?
Thank you

Another option is using JavaDb, which comes for free with Java 6 (and later versions)
Before you make a choice, you should think about questions such as:
presumably you want this transparent to the user (i.e. no admin involved)
is performance an issue ?
what happens if the storage schema needs migration
do you need transactionality (unlikely, I suspect)
etc. It's quite possible that even a simple text file would suffice. Perhaps your best bet is to choose a simple solution (e.g. a text file) and implement that, and see how far it takes you. However, provide a suitable persistence level abstraction such that you can slot in a different solution in the future with minimal disruption.

I would go for the XML files as they are more generic and could be opened outside your messenger with more or less human readable format. I use Pidgin for instant messaging and it saves chat history in XML. Also to read the history from your application you can transform then easily in HTML to display it nicely.

If you use JAXB, converting Java objects to/from XML is very easy. You just put a few annotations on your classes, and run them through a JAXB marshaller/unmarshaller. See http://docs.oracle.com/javaee/5/tutorial/doc/bnbay.html

Use google's protocolbuffer or 10gen's bson. they are much smaller and faster.
http://code.google.com/apis/protocolbuffers/docs/javatutorial.html
http://bsonspec.org/
One issue is these are in the binary presentation and you might want to make the archive transparent/readable to users

Whats the best way to implement a simple document management system?

I am planning to build a simple document management system. Preferably built around the java platform. Are there are best practices around this? The requirements are :
Ability to upload documents
Ability to Tag documents
Version the documents
Comment on documents
There are a couple of options that I am currently considering. The first option would be a simple API on top of SVN or CVS and use a DB backend to track tags, uploader, comments etc
Another option is to use the filesystem. Version the documents as copies in a versions folder and work with filenames.
Or, if there is an Open non GPL'ed doc management system, we could customize it to our needs and package it in our application. Does anybody have any experience building something like this?

You may want to take a look at Content repository API for Java and the several implementations (some of them free).

Take a look at the many Document Oriented Database systems out there. I can't speak about MongoDB or any of the others, but my experience with Couchdb has been fantastic.
http://couchdb.apache.org/
best part of it is that you communicate with it via a REST protocol.

The best way is to reuse the efforts of others. This particular wheel has been invented quite a bit of times.
Who will use this and for what purpose?

How to understand Open Source projects/libraries?

There are few open source projects/APIs/libraries that we use in our project (Spring, Struts, iBatis etc.) and I want to understand their design and how they work internally.
What is the best way to understand these projects? Note that I am already using these libraries in my project. And I know the input-output interaction/configurations for these libraries. What I don't understand is how these APIs/libraries work internally.
The problems I face is:
Finding the entry class of the library. Is there any way I can know the entry class for the library - something which is kicking the whole API?
Tools/Plugins to use in Eclipse to get an overview of the design of the library. Going through each and every class of the library, can be a very daunting task. Is there any tool you would like to recommend which can generate the class diagrams of the API in Eclipse.
Thanks in advance!!
UPDATE: I need some inputs on eclipse plugins which can help me in getting an overview/class diagram of the library

I always use the same strategy for this: I never try to "understand" the code base as a whole, and I usually try to follow the request flow. I read enough of the documentation to determine what is necessary to use the application, and I read that code (Keep all source code loaded in your IDE).
For example, in struts you'll be installing a servlet filter in web.xml. Start reading the filter and follow the path a single request takes through your stack.
Likewise for spring, there are two main entry points, the filter and "getBean", both of which are mentioned real early in the documentation. Read those two.
For both of these cases you'll find one or two classes that represent the "core" of the framework real quickly. Read those really well and let actual use cases & needs drive your further exploration.
Approaching "understanding" of an open source library (or any other code base for that matter) by trying to find all the pieces is usually not a very good way of approaching these things, it will usually just lead nowhere because a lot of these things contain too much code. When following the request flow I find making diagrams can also be quite distracting, it tends to draw attention/focus away from understanding (and since my understanding increases rapidly most of them are out-of-date even before they reach the printer).

Nice question!!!, what I've done, specially in the case of Spring, apart from consulting the Documentation and their API's is to attach the sources of the project to my project on Eclipse, that way I'm able to navigate through the source code, not just the API. Its been quite helpful specially in the case of the Spring-Security project, there were some concepts that I just couldn't understand until I inspected the source code.
That's one of the advantages of using Open Source libraries.
Regards.

Tools like Structure101 (http://www.headwaysoftware.com/products/structure101/index.php), and Lattix (http://www.lattix.com/) let you analyze code and produce architecture diagrams / dependency matrices.
This is not exactly class diagram - the main focus is on layering. So the entry point is usually the topmost layer.
But then again, as I specified above, you will notice that some libs are just a mess, and these tools will not be helpful enough.
See the S101 online demo: http://www.structure101.com/java/
This for example is the Sonar project architecture: http://www.structure101.com/java/tracker/sonar/1.11.1/arch.html

Your best bet for those three would be to consult the official documentation (make sure you are looking at the version you are using) or to get a book on the technology.

Most APIs don't have a class with a main method; they're running in the webserver called by the server itself. Unless they're running as their own server, they won't have a main method.

Extend JackRabbit or build up from Lucene?

I've been working on a site idea the general concept is a full text search of documents that also allows user ratings based on these rating I wanted to boost the item's value in the Lucene index. But I'm trying to find if I should extend JackRabbit or just build from the Lucene base. Is there any good way to extend JackRabbit in this way and effect the index or would it be best to work directly off Lucene?
Either way I go I am strongly leaning to using groovy on grails with either the searchable plugin or work directly with JackRabbit is there any major reasons I should just stick to Java?
Clarification:
I would like to boost an item based on the average user rating of an item, is JackRabbit open enough or expandable enough where I can capture user ratings then have those effect the index within JackRabbit or is it so far out of the core of JackRabbit I should just build up from Lucene?

I recommend using JCR, with the implementation of Jackrabbit behind it. JCR allows you to separate between what you store and how you store it.
By staying within a JCR framework, you should be able to easily switch among JCR implementations. (There are several, not just Apache's.) Even within Jackrabbit are many persistence managers, not just Lucene. This flexibility is useful when you want to trade off between storage space and performance.
JCR already includes full text searches and the ability to maintain user ratings. It should be a good fit for your project.

is there any major reasons I should just stick to Java?
Not really. As you probably already know, you can use any Java library with Groovy/Grails, so there's nothing you can do in Java that you can't do in Groovy. Although the contrary is also true, in my experience, it takes a lot more (boilerplate) code to get things done in Java.
Although Java is considerable faster than Groovy, this doesn't necessarily mean your app will be faster if written in Java, as the bottleneck could likely be the database rather than code execution.
As for whether you should use Lucene/Searchable or JackRabbit, it's very difficult to say without knowing much about what you can achieve. All you've told us so far is that you want to index documents and boost certain items in the index. You can certainly do both of those with Lucene.

I would recommend using JCR/Jackrabbit on top of Lucene for a couple of reasons:
1) Your repository structure could readily support document nodes with child nodes that store all of your meta-data including owner, ratings, flagging, comments, etc.
2) JCR is ideal for document/node based app development, providing a lot of the heavy lifting at the framework level while not getting in your way at the app level.

I would recommend you to use Apache Sling, it comes with Jackrabbit/Lucene built-in.
Most of the committers are also involved with Jackrabbit, so it's designed to work well with it -- even better, it's designed to run on top of it.
One of the nice features of Sling is that it mounts the entire JCR repository in the URL space and exposes it via REST endpoints.
So you can access your documents/metadata very easily by doing a simple HTTP request to it. It also allows you to write your own servlets and expose them as REST endpoints. (This is extremely easy -- no fiddling about with applicationContext.xml files, just 1 annotation)
It also allows you to write jsp, esp, groovy, ...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Process XML data with Java - java

have you checked out Drools (a project from JBoss) very easy to learn & is an excellent workflow tool. building your own workflow engine is quite complex & there are a lot of considerations to be taken into account.

You can think of using activiti, Another workflow solution. It has APIs available and can be used as a workflow service layer in your application.

Related

i18n in build process (or compiling one template HTML to i18n HTMLs)

Java Messenger : save message archives on the computer

Whats the best way to implement a simple document management system?

How to understand Open Source projects/libraries?

Extend JackRabbit or build up from Lucene?

Categories

Resources