Java OO design for handling large XML - java

We are designing a system for processing XML messages.
The processing Java class needs to split out various attributes and values from a largish XML and pass these as parameters to individual handler classes for varied operations.
We have thought of following options:
A)
Pass the entire XML to each handler and let it extract the relevant bits - but feel this might be inefficient to pass the XML around each time
B)
Convert the XML into a DTO or set of smaller DTOs and pass each DTO to relevant handler
C)
Cut the XML into snippets and pass these to each handler method
We're not happy with each of these, so any suggestions which way to go?
Example XML
<IdAction>supplied</IdAction>
<RegId>true</RegId>
<DeRegId>false</DeRegId>
<SaveMessage>false</SaveMessage>
<ServiceName>abcRequest</ServiceName>
<timeToPerform>3600</timeToPerform>
<timeToReceipt/>
<SendToBES>true</SendToBES>
<BESQueueName>com.abc.gateway.JMSQueue.forAddRequest</BESQueueName>
<BESTransform/>
<BESJMSProperties>
<property>
<propName>stateCode</propName>
<propValue>OK</propValue>
</property>
<property>
<propName>stateResponse</propName>
<propValue>OK</propValue>
</property>
</BESJMSProperties>
This contains 4 blocks processed by 4 handlers one does
<IdAction>supplied</IdAction>
<RegId>true</RegId>
<DeRegId>false</DeRegId>
another does
<timeToPerform>3600</timeToPerform>
<timeToReceipt/>
next does
<SendToBES>true</SendToBES>
<BESQueueName>com.abc.gateway.JMSQueue.forAddRequest</BESQueueName>
<BESTransform/>
<BESJMSProperties>
<property>
<propName>stateCode</propName>
<propValue>OK</propValue>
</property>
<property>
<propName>stateResponse</propName>
<propValue>OK</propValue>
</property>
</BESJMSProperties>
and so on

B sounds like the best option to me. A is most inefficient, and C would presumably need one pass to parse it and pick out the fragments, then a 2nd pass to properly handle them?
Use SAX to parse out minimal DTO sets for transmission to dedicated handler classes.
Good question, btw. Good to think about these things in advance, and get 2nd, 3rd, 4th opinions :-)

I don't think you need any special design considerations in terms of memory usage or performance so I would go with the solution that involved the least amount of coding and that would be to use a JAXB marshaller to parse your xml into DTOs and then going with your plan B. Perhaps it is harder to set up than StAX but it saves you from writing any XML parsing.
http://jaxb.java.net/
if you are using Spring is very easy to set up a bean for org.springframework.oxm.jaxb.Jaxb2Marshaller
http://static.springsource.org/spring-ws/site/reference/html/oxm.html (8.5.2)

Tried?
http://simple.sourceforge.net/
Personally i would create a datamodel for the xml and pass the datamodel around. Take a look at the tutorials. With a custom datamodel you can map only the data you want into the model and for the handler classes you can pass along child nodes or a subset of the xml data model instead of the entire thing.
If you have an xml with the following structure
<book>
<title>XML</title>
<author>
<firstname>John</firstname>
<lastname>Doe</lastname>
</author>
<isbn>123541356eas</isbn>
</book>
Then you would have a datamodel something like this:
[ Book ] [ Author ]
--------------- ------------------
|String title | |String firstname|
|String isbn | |String lastname |
|Author author| ------->|----------------|
---------------
Where Book has a reference to Author.
And then you could pass along the Author object to your handler method.

You could use StAX for this use case. Each processBlock operation will act on the XMLStreamReader advancing its state, then subsequent processBlock operations can do their bit:
package forum7011558;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
public class Demo {
public static void main(String[] args) throws Exception {
Demo demo = new Demo();
FileReader xml = new FileReader("src/forum7011558/input.xml");
XMLInputFactory xif = XMLInputFactory.newFactory();
XMLStreamReader xsr = xif.createXMLStreamReader(xml);
demo.processBlock1(xsr);
demo.processBlock2(xsr);
demo.processBlock3(xsr);
demo.processBlock4(xsr);
}
private void processBlock1(XMLStreamReader xsr) {
// PROCESS BLOCK 1
}
private void processBlock2(XMLStreamReader xsr) {
// PROCESS BLOCK 2
}
private void processBlock3(XMLStreamReader xsr) {
// PROCESS BLOCK 3
}
private void processBlock4(XMLStreamReader xsr) {
// PROCESS BLOCK 4
}
}

Related

XML base data validation rule in java

Hi I am completely new in XML in Java.In my recent project I need to create validation rules in XML,but the the problem is that different user group may have different rule
For example
<root>
<user-group type="sale">
<parameter-name ="loginName">
<max-length>10</max-length>
<min-length>4</min-length>
</parameter-name>
<parameter-name ="password">
<max-length>10</max-length>
<min-length>4</min-length>
</parameter-name>
</user-group>
<user-group type="clerk">
<parameter-name ="loginName">
<max-length>16</max-length>
<min-length>4</min-length>
</parameter-name>
<parameter-name ="password">
<max-length>12</max-length>
<min-length>8</min-length>
</parameter-name>
</user-group>`
</root>
So how to write a Java stuff to implements the above rule.
Thanks in advance.
Read the XML using one of the known XML parsers. Refer
XML Parsing for Java
As you read through the XML, you can create a data structure to store the rules. This is explained below.
Loop through each of the "user-group" XML nodes in your Java program, create a map implementation, you can use a HashMap, with key - "clerk" value will be a POJO bean defining a "rule"
For example here is your "Rules" class -
public class Rules {
private String ruleName;
private int maxLength;
private int minLength;
public String getRuleName() {
return ruleName;
}
public void setRuleName(String ruleName) {
this.ruleName = ruleName;
}
public int getMinLength() {
return minLength;
}
public void setMinLength(int minLength) {
this.minLength = minLength;
}
public void setMaxLength(int maxLength) {
this.maxLength = maxLength;
}
public int getMaxLength() {
return maxLength;
}
}
Now you can use this HashMap anywhere in your program, to implement the rules. Seems like you would need to implement rules on the UI. In that case, I would recommend using established frameworks like Struts, Spring or an equivalent framework.
Hope this gives you a headstart ;)
The simple answer: use XML schemas with define namespaces. This way each user-group type can define what the structure of that node is. Setting this as an attribute is not really the most effective way to do this. I can elaborate later tonight on how to use XSD with namespaces so that you could create a document with "different" user-group nodes, specified in different namespaces, that each entity could validate and use without any problems. I don't have time to show an example, but I found this: Creating an XML document using namespaces in Java
The most simplistic explanation I can come up with is the definition of "table". For a furniture store, a "table" entity has maybe a round or square surface with most likely 4 legs, etc. But a "table" could mean something completely different for some other group. Using your XML as an example, it would be something like this:
<root>
<sale:user-group xmlns:sale="SOME_URL">
<some structure and rules>
</sale:user-group>
<clerk:user-group xmlns:clerk="SOME_OTHER_URL">
<different structure and rules>
</clerk:user-group>
</root>
The link I provided should answer your question. If not, I will come back tonight and show you a simple XSD that might fit your case.

What's a good design pattern to implement a network protocol (XML)?

I want to implement a network protocol. To obtain a maintainable design I am looking for fitting patterns.
The protocol is based on XML and should be read with java. To simplify the discussion here I assume the example grammar:
<User>
<GroupList>
<Group>group1</Group>
<Group>group2</Group>
</GroupList>
</User>
Short question:
What is a good design pattern to parse such thing?
Long version:
I have found this and this question where different patterns (mostly state pattern) are proposed.
My actual (but lacking) solution is the folowing:
I create for each possible entry in the XML a class to contain the data and a parser. Thus I have User, User.Parser, ... as classes.
Further there is a ParserSelector that has a Map<String,AbstractParser> in which all possible subentries get registered.
For each parser a ParserSelector gets instantiated and set up.
For example the ParserSelector of the GroupList.Parser has one entry: The mapping from the string "Group" to an instance of Group.Parser.
If I did not use the ParserSleector class, I would have to write this block of code into every single parser.
The problem is now how to get the read data to the superobjects.
The Group.Parser would create a Group object with content group1.
This object must now be registered in the GroupList object.
I have read of using Visitor or Observer patterns but do not understand how they might fit here.
I give some pseudo code below to see the problem.
You see, that I have to check via instanceof for the type as statically there is the type information not available.
I thought this should be possible to solve using polymorphism in java in a cleaner (more maintainable) way.
I always face then the problem that java does only do dynamic binding on overriding.
Thus I cannot add a parameter to the XMLParser.parse(...) method to allow of "remote updating" as in a visitor/observer like approach.
Side remark: The real grammar is "deep" that is, it is such that there are quite many XML entries (here only three: User, GroupList and Group) while most of them might contain only very few different subentries (User and GroupList may only contain one subentry here, while Group itself contains only text).
Here comes some lines of pseude java code to explain the problem:
class User extends AbstractObject {
static class Parser implements XMLParser {
ParserSelector ps = ...; // Initialize with GroupList.Parser
void parse(XMLStreamReader xsr){
XMLParser p = ps.getParser(...); // The corresponding parser.
// We know only that it is XMLParser statically.
p.parse(...);
if(p instanceof GroupList.Parser){
// Set the group list in the User class
}
}
}
}
class GroupList extends AbstractObject{...}
class Group extends AbstractObject{...}
class ParserSelector{
Map<String,XMLParser> = new Map<>();
void registerParser(...){...} // Registers a possible parser for subentries
XMLParser getParser(String elementName){
return map.get(elementName); // Returns the parser registered with the given name
}
}
interface XMLParser {
void parse(XMLStreamReader xsr);
}
abstract class AbstractObject{}
To finish this question:
I ended up with JAXB. In fact I was not aware of the fact that it allows to easily create a XML Schema from java source code (using annotations).
Thus I just have to write the code with classical java objects which are used for transfer. Then the API handles the conversion to and from XML quite well.

Java - Using Ant to automatically generate boilerplate code

Intro:
I'm asking this before I try, fail and get frustrated as I have 0 experience with Apache Ant. A simple 'yes this will work' may suffice, or if it won't please tell me what will.
Situation:
I'm working on a project that uses JavaFX to create a GUI. JavaFX relies on Java Bean-like objects that require a lot of boilerplate code for it's properties. For example, all functionality I want to have is a String called name with default value "Unnamed", or in a minimal Java syntax:
String name = "Unnamed";
In JavaFX the minimum amount of code increases a lot to give the same functionality (where functionality in this case means to me that I can set and get a certain variable to use in my program):
private StringProperty name = new StringProperty("Unnamed");
public final String getName() { return name.get(); }
public final void setName(String value) { name.set(value); }
Question: Can I use Ant to generate this boilerplate code?
It seems possible to make Ant scripts that function as (Java) preprocessors. For instance by using the regex replace (https://ant.apache.org/manual/Tasks/replaceregexp.html) functions. I'm thinking of lines of code similar to this in my code, which then will be auto-replaced:
<TagToSignifyReplaceableLine> StringProperty person "Unnamed"
Final remark: As I've said before I have never used Ant before, so I want to check with you if 1) this can be done and 2) if this is a good way to do it or if there are better ways.
Thanks!
Yes, possible. You can even implement your own Ant task, that does this job very easily.
Something like so in ant:
<taskdef name="codegen" classpath="bin/" classname="com.example.CodeGen" />
and then
<codegen className="Test.java">
<Property name="StringProperty.name" value="Unnamed"/>
</codegen>
The CodeGen.java then like so:
public class CodeGen extends Task {
private String className = null;
private List properties = new ArrayList();
public void setClassName(String className) {
this.className = className;
}
/**
* Called by ant for every <property> tag of the task.
*
* #param property The property.
*/
public void addConfiguredProperty(Property property) {
properties.add(property);
}
public void execute() throws BuildException {
// here we go!
}
}
I know it can be done because my previous firm used ant to generate model objects in java.
The approach they used was to define model objects in an XML file and run an ant task to generate the pojo and dto.
I quickly googled and saw that there are tools that allow you to generate java from XML. You could probably give your schema/default values etc in XML and have an nt task to run the tool.
I would look at JSR-269 specifically: genftw which makes JSR-269 easier...
And yes it will work with Ant with out even having to write a plugin and will work better than a brittle RegEx.
The other option if your really adventurous is to check out XText for code generation but it is rather complicated.
Yes, it can be done :-)
I once wrote a webservices adapter that used a WSDL document (XML file describing a SOAP based webservice) to generate the POJO Java class that implemented the functional interface to my product. What lead me to do this was the mindlessly repetitive Java code which was necessary to talk to our proprietary system.
The technical solution used an XSLT stylesheet to transform the input XML document into an output Java text file which was subsequently compiled by ANT.
<!-- Generate the implementation classes -->
<xslt force="true" style="${resources.dir}/javaServiceStub.xsl" in="${src.dir}/DemoService.wsdl" out="${build.dir}/DemoService/src/com/myspotontheweb/DemoServiceSkeleton.java" classpathref="project.path">
<param name="package" expression="com.myspotontheweb"/>
..
..
</xslt>
Unfortunately XSLT is the closest thing to a templating engine supported by native ANT.
Best of luck!

Best design pattern to implement upload feature

I am working on a web application which is based on spring MVC. We have various screens for adding different domain components(eg. Account details, Employee details etc). I need to implement an upload feature for each of these domain components i.e. to upload Account, upload employee details etc which will be provided in a csv file (open the file, parse its contents, validate and then persist).
My question is, which design pattern should i consider to implement such a requirement so that upload (open the file, parse its contents, validate and then persist) feature becomes generic. I was thinking about using the template design pattern. Template Pattern
Any suggestions,pointers,links would be highly appreciated.
I am not going to answer your question. That said, let me answer your question! ;-)
I think that design patterns should not be a concern in this stage of development. In spite of their greatness (and I use them all the time), they should not be your primary concern.
My suggestion is for you to implement the first upload feature, then the second and then watching them for what they have that is equal and create a "mother" class. Whenever you come to a third class, repeat the process of generalization. The generic class will come naturally in this process.
Sometimes, I believe that people tend to over engineer and over plan. I am in good company: http://www.joelonsoftware.com/items/2009/09/23.html. Obviouslly, I am not advocating for no design software - that never works well. Nevertheless, looking for similarities after some stuff has been implemented and refactoring them may achieve better results (have you already read http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672/ref=sr_1_1?ie=UTF8&qid=1337348138&sr=8-1? It is old but stiil great!).
A strategy pattern my be useful here for the uploader. The Uploader class would be a sort of container/manager class that would simply contain a parsing attribute and a persistance attribute. Both of these attributes would be defined as an abstract base class and would have multiple implementations. Even though you say it will always be csv and oracle, this approach would be future-proof and would also separate the parsing/verifying from the persistence code.
Here's an example:
class Uploader
{
private:
Parser parser_;
Persistence persistence_;
void upload() {
parser_.read();
parser_.parse();
parser_.validate();
persistence_.persist(parser_.getData());
}
public:
void setParser(Parser parser) {parser_ = parser;}
void setPersister(Persistence persistence) {persistence_ = persistence;}
};
Class Parser
{
abstract void read();
abstract void parse();
abstract void validate();
abstract String getData();
};
class Persistence
{
abstract persist(String data);
};
class CsvParser : public Parser
{
// implement everything here
};
// more Parser implementations as needed
class DbPersistence : public Persistence
{
// implement everything here
};
class NwPersistence : public Persistence
{
// implement everything here
};
// more Persistence implementations as needed
You could use an Abstract Factory pattern.
Have an upload interface and then implement it for each of the domain objects and construct it in the factory based on the class passed in.
E.g.
Uploader uploader = UploadFactory.getInstance(Employee.class);

How would you use Java to handle various XML documents?

I'm looking for the best method to parse various XML documents using a Java application. I'm currently doing this with SAX and a custom content handler and it works great - zippy and stable.
I've decided to explore the option having the same program, that currently recieves a single format XML document, receive two additional XML document formats, with various XML element changes. I was hoping to just swap out the ContentHandler with an appropriate one based on the first "startElement" in the document... but, uh-duh, the ContentHandler is set and then the document is parsed!
... constructor ...
{
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
parser = sp.getXMLReader();
parser.setErrorHandler(new MyErrorHandler());
} catch (Exception e) {}
... parse StringBuffer ...
try {
parser.setContentHandler(pP);
parser.parse(new InputSource(new StringReader(xml.toString())));
return true;
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
...
So, it doesn't appear that I can do this in the way I initially thought I could.
That being said, am I looking at this entirely wrong? What is the best method to parse multiple, discrete XML documents with the same XML handling code? I tried to ask in a more general post earlier... but, I think I was being too vague. For speed and efficiency purposes I never really looked at DOM because these XML documents are fairly large and the system receives about 1200 every few minutes. It's just a one way send of information
To make this question too long and add to my confusion; following is a mockup of some various XML documents that I would like to have a single SAX, StAX, or ?? parser cleanly deal with.
products.xml:
<products>
<product>
<id>1</id>
<name>Foo</name>
<product>
<id>2</id>
<name>bar</name>
</product>
</products>
stores.xml:
<stores>
<store>
<id>1</id>
<name>S1A</name>
<location>CA</location>
</store>
<store>
<id>2</id>
<name>A1S</name>
<location>NY</location>
</store>
</stores>
managers.xml:
<managers>
<manager>
<id>1</id>
<name>Fen</name>
<store>1</store>
</manager>
<manager>
<id>2</id>
<name>Diz</name>
<store>2</store>
</manager>
</managers>
As I understand it, the problem is that you don't know what format the document is prior to parsing. You could use a delegate pattern. I'm assuming you're not validating against a DTD/XSD/etcetera and that it is OK for the DefaultHandler to have state.
public class DelegatingHandler extends DefaultHandler {
private Map<String, DefaultHandler> saxHandlers;
private DefaultHandler delegate = null;
public DelegatingHandler(Map<String, DefaultHandler> delegates) {
saxHandlers = delegates;
}
#Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
if(delegate == null) {
delegate = saxHandlers.get(name);
}
delegate.startElement(uri, localName, name, attributes);
}
#Override
public void endElement(String uri, String localName, String name)
throws SAXException {
delegate.endElement(uri, localName, name);
}
//etcetera...
You've done a good job of explaining what you want to do but not why. There are several XML frameworks that simplify marshalling and unmarshalling Java objects to/from XML.
The simplest is Commons Digester which I typically use to parse configuration files. But if you are want to deal with Java objects then you should look at Castor, JiBX, JAXB, XMLBeans, XStream, or something similar. Castor or JiBX are my two favourites.
I have tried the SAXParser once, but once I found XStream I never went back to it. With XStream you can create Java Objects and convert them to XML. Send them over and use XStream to recreate the object. Very easy to use, fast, and creates clean XML.
Either way you have to know what data your going to receiver from the XML file. You can send them over in different ways to know which parser to use. Or have a data object that can hold everything but only one structure is populated (product/store/managers). Maybe something like:
public class DataStructure {
List<ProductStructure> products;
List<StoreStructure> stors;
List<ManagerStructure> managers;
...
public int getProductCount() {
return products.lenght();
}
...
}
And with XStream convert to XML send over and then recreate the object. Then do what you want with it.
See the documentation for XMLReader.setContentHandler(), it says:
Applications may register a new or different handler in the middle of a parse, and the SAX parser must begin using the new handler immediately.
Thus, you should be able to create a SelectorContentHandler that consumes events until the first startElement event, based on that changes the ContentHandler on the XML reader, and passes the first start element event to the new content handler. You just have to pass the XMLReader to the SelectorContentHandler in the constructor. If you need all the events to be passes to the vocabulary specific content handler, SelectorContentHandler has to cache the events and then pass them, but in most cases this is not needed.
On a side note, I've lately used XOM in almost all my projects to handle XML ja thus far performance hasn't been the issue.
JAXB. The Java Architecture for XML Binding. Basically you create an xsd defining your XML layout (I believe you could also use a DTD). Then you pass the XSD to the JAXB compiler and the compiler creates Java classes to marshal and unmarshal your XML document into Java objects. It's really simple.
BTW, there are command line options to jaxb to specify the package name you want to place the resulting classes in, etc.
If you want more dynamic handling, Stax approach would probably work better than Sax.
That's quite low-level, still; if you want simpler approach, XStream and JAXB are my favorites. But they do require quite rigid objects to map to.
Agree with StaxMan, who interestingly enough wants you to use Stax. It's a pull based parser instead of the push you are currently using. This would require some significant changes to your code though.
:-)
Yes, I have some bias towards Stax. But as I said, oftentimes data binding is more convenient than streaming solution. But if it's streaming you want, and don't need pipelining (of multiple filtering stages), Stax is simpler than SAX.
One more thing: as good as XOM is (wrt alternatives), often Tree Model is not the right thing to use if you are not dealing with "document-centric" xml (~= xhtml pages, docbook, open office docs).
For data interchange, config files etc data binding is more convenient, more efficient, more natural. Just say no to tree models like DOM for these use cases.
So, JAXB, XStream, JibX are good. Or, for more acquired taste, digester, castor, xmlbeans.
VTD-XML is known for being the best XML processing technology for heavy duty XML processing. See the reference below for a proof
http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

Categories

Resources