I am new to parsing XML Document.
I'm parsing a xml document and I want to create my own JSON based on that XML File. For that reason I want to Identify the xml tag has childs or not?
I'm overrided the startElement() method of ContentHandler by extending DefaultHandler class.
Code Like this :
#Override
public void startElement(String uri, String localName,String qName,Attributes attributes){
if(qName has a child){
//perform 1st task
} else {
//perform 2nd task
}
}
Please help me to do this task and guide me if I'm going wrong?
If your document is small, using a DOM parser will simplify your task.
If it is mandatory to use a SAX parser then you should also override the endElement() method. Then between a start and end element on the same element, all startElement events are related to a child element.
I am writing a class that extends a class that uses Digester to parse an XML response from an API (Example existing class, code snipper below). After receiving the response, the code creates an object and adds specific methods on that.
Code snippet edited for brevity:
private Digester createDigester() {
Digester digester = new Digester();
digester.addObjectCreate("GeocodeResponse/result", GoogleGeocoderResult.class);
digester.addObjectCreate("GeocodeResponse/result/address_component", GoogleAddressComponent.class);
digester.addCallMethod("GeocodeResponse/result/address_component/long_name", "setLongName", 0);
...
digester.addSetNext("GeocodeResponse/result/address_component", "addAddressComponent");
Class<?>[] dType = {Double.class};
digester.addCallMethod("GeocodeResponse/result/formatted_address", "setFormattedAddress", 0);
...
digester.addSetNext("GeocodeResponse/result", "add");
return digester;
}
}
The API that I will be calling, however, only supports JSON. I have found a probable solution, which involves converting the JSON to XML and then running it through Digester, but that seems incredibly hackish.
public JsonDigester(final String customRootElementName) {
super(new JsonXMLReader(customRootElementName));
}
Is there a better way to do this?
This class is specifically meant to deal with XML as per the documentation:
Basically, the Digester package lets you configure an XML -> Java
object mapping module, which triggers certain actions called rules
whenever a particular pattern of nested XML elements is recognized. A
rich set of predefined rules is available for your use, or you can
also create your own.
Why would you think it would work with JSON?
I was going through this tutorial and I noticed that startElement method is called twice but i do not see any method call for this... it seems to be happening automatically... can you explain to me how this method is called ?
The callback method is called by the Parser Object when it reaches the start of an object.
For example, to parse an xml file using an SAX parser you will have :
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
saxParser.parse(new File(sourceFile), this);
where "this" refers to the current class that implements the interface ContentHandler. We override the callback methods such as startElement, and they will be called when your saxParser reads through certain events.
also please refer to this page about callback functions if interested.
startElement is started when new tag occurs, and when you close this tag endElement is called. So if you have something like this:
<jobs>
<job>
<id>4</id>
...
...
</job>
</jobs>
first xml parser opens jobs tag and then job. When he finish, he calls first job and then jobs
I'm trying to write something in Java that receives an XML string and validates it against an XSD schema, and does automatic error handling for some simple common errors, and outputs a fixed XML string.
I've come across the SAX ErrorHandler interface for the Validator.validate() function, but that seems to be mostly for reporting exceptions and I can't figure out how to modify the XML from it, other than getting the line/column number which would be very tedious to fix problems.
I also found the Validator.validate() function which has a source and a result, and returns augmented XML, which to my knowledge just fills in missing attributes that have default values, which is part of what I need to do.
But I also need something along the lines of fixing a missing start or end tag, and correcting a tag that has been misspelled by a letter, and things like that. There are so many "Handler" interfaces (ValidationHandler, ContentHandler, EntityResolver) that I'm not sure which ones to look at in depth, so if someone could point me in the right direction that would be great (I don't need a detailed code example).
Also I'm not sure how the XMLReader fits in to it all.
To deal with errors you have to implement the interface ErrorHandler or to extend the DefaultHandler helper class and redefine the error method. That is the method called for validation errors. If you want to be more precise, I think that you will have to analyze the error message. I don't think SaX will give you something that makes errors easy to fix.
BTW, note that for validating against an XSD, you should not use the method setValidating. See the code below.
The Java doc (1.7) of the setValidating method says :
Note that "the validation" here means a validating parser as defined in the XML recommendation. In other words, it essentially just controls the DTD validation. (except the legacy two properties defined in JAXP 1.2.)
To use modern schema languages such as W3C XML Schema or RELAX NG instead of DTD, you can configure your parser to be a non-validating parser by leaving the setValidating(boolean) method false, then use the setSchema(Schema) method to associate a schema to a parser.
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
// ...
public static void main(String args[]) throws Exception {
if (args.length == 0 || args.length > 2) {
System.err.println("Usage: java Validator <doc.xml> [<schema.xsd>]");
System.exit(1);
}
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants. W3C_XML_SCHEMA_NS_URI);
String xsdpath = "book.xsd";
if (args.length == 2) {
xsdpath = args[1];
}
Schema s = sf.newSchema(new File(xsdpath));
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setSchema(s);
XMLReader parser = factory.newSAXParser().getXMLReader();
parser.setFeature("http://xml.org/sax/features/namespaces", true);
parser.setFeature("http://xml.org/sax/features/namespace-prefixes", false);
PrintStream out = new PrintStream(System.out, true, "UTF-8");
parser.setContentHandler(new MyHandler(out));
parser.setErrorHandler(new DefaultHandler());
parser.parse(args[0]);
}
}
I've used DocumentBuilderFactory with setValidating(true) to generate an instance of an XML validating parser (i.e. DocumentBuilder).
Note that both validating and non-validating XML parsers will verify that the XML is "well formed" (e.g. end-tags, etc.). "Validating" refers to checking that the XML conforms to a DTD or schema.
I'm looking for the best method to parse various XML documents using a Java application. I'm currently doing this with SAX and a custom content handler and it works great - zippy and stable.
I've decided to explore the option having the same program, that currently recieves a single format XML document, receive two additional XML document formats, with various XML element changes. I was hoping to just swap out the ContentHandler with an appropriate one based on the first "startElement" in the document... but, uh-duh, the ContentHandler is set and then the document is parsed!
... constructor ...
{
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
parser = sp.getXMLReader();
parser.setErrorHandler(new MyErrorHandler());
} catch (Exception e) {}
... parse StringBuffer ...
try {
parser.setContentHandler(pP);
parser.parse(new InputSource(new StringReader(xml.toString())));
return true;
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
...
So, it doesn't appear that I can do this in the way I initially thought I could.
That being said, am I looking at this entirely wrong? What is the best method to parse multiple, discrete XML documents with the same XML handling code? I tried to ask in a more general post earlier... but, I think I was being too vague. For speed and efficiency purposes I never really looked at DOM because these XML documents are fairly large and the system receives about 1200 every few minutes. It's just a one way send of information
To make this question too long and add to my confusion; following is a mockup of some various XML documents that I would like to have a single SAX, StAX, or ?? parser cleanly deal with.
products.xml:
<products>
<product>
<id>1</id>
<name>Foo</name>
<product>
<id>2</id>
<name>bar</name>
</product>
</products>
stores.xml:
<stores>
<store>
<id>1</id>
<name>S1A</name>
<location>CA</location>
</store>
<store>
<id>2</id>
<name>A1S</name>
<location>NY</location>
</store>
</stores>
managers.xml:
<managers>
<manager>
<id>1</id>
<name>Fen</name>
<store>1</store>
</manager>
<manager>
<id>2</id>
<name>Diz</name>
<store>2</store>
</manager>
</managers>
As I understand it, the problem is that you don't know what format the document is prior to parsing. You could use a delegate pattern. I'm assuming you're not validating against a DTD/XSD/etcetera and that it is OK for the DefaultHandler to have state.
public class DelegatingHandler extends DefaultHandler {
private Map<String, DefaultHandler> saxHandlers;
private DefaultHandler delegate = null;
public DelegatingHandler(Map<String, DefaultHandler> delegates) {
saxHandlers = delegates;
}
#Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
if(delegate == null) {
delegate = saxHandlers.get(name);
}
delegate.startElement(uri, localName, name, attributes);
}
#Override
public void endElement(String uri, String localName, String name)
throws SAXException {
delegate.endElement(uri, localName, name);
}
//etcetera...
You've done a good job of explaining what you want to do but not why. There are several XML frameworks that simplify marshalling and unmarshalling Java objects to/from XML.
The simplest is Commons Digester which I typically use to parse configuration files. But if you are want to deal with Java objects then you should look at Castor, JiBX, JAXB, XMLBeans, XStream, or something similar. Castor or JiBX are my two favourites.
I have tried the SAXParser once, but once I found XStream I never went back to it. With XStream you can create Java Objects and convert them to XML. Send them over and use XStream to recreate the object. Very easy to use, fast, and creates clean XML.
Either way you have to know what data your going to receiver from the XML file. You can send them over in different ways to know which parser to use. Or have a data object that can hold everything but only one structure is populated (product/store/managers). Maybe something like:
public class DataStructure {
List<ProductStructure> products;
List<StoreStructure> stors;
List<ManagerStructure> managers;
...
public int getProductCount() {
return products.lenght();
}
...
}
And with XStream convert to XML send over and then recreate the object. Then do what you want with it.
See the documentation for XMLReader.setContentHandler(), it says:
Applications may register a new or different handler in the middle of a parse, and the SAX parser must begin using the new handler immediately.
Thus, you should be able to create a SelectorContentHandler that consumes events until the first startElement event, based on that changes the ContentHandler on the XML reader, and passes the first start element event to the new content handler. You just have to pass the XMLReader to the SelectorContentHandler in the constructor. If you need all the events to be passes to the vocabulary specific content handler, SelectorContentHandler has to cache the events and then pass them, but in most cases this is not needed.
On a side note, I've lately used XOM in almost all my projects to handle XML ja thus far performance hasn't been the issue.
JAXB. The Java Architecture for XML Binding. Basically you create an xsd defining your XML layout (I believe you could also use a DTD). Then you pass the XSD to the JAXB compiler and the compiler creates Java classes to marshal and unmarshal your XML document into Java objects. It's really simple.
BTW, there are command line options to jaxb to specify the package name you want to place the resulting classes in, etc.
If you want more dynamic handling, Stax approach would probably work better than Sax.
That's quite low-level, still; if you want simpler approach, XStream and JAXB are my favorites. But they do require quite rigid objects to map to.
Agree with StaxMan, who interestingly enough wants you to use Stax. It's a pull based parser instead of the push you are currently using. This would require some significant changes to your code though.
:-)
Yes, I have some bias towards Stax. But as I said, oftentimes data binding is more convenient than streaming solution. But if it's streaming you want, and don't need pipelining (of multiple filtering stages), Stax is simpler than SAX.
One more thing: as good as XOM is (wrt alternatives), often Tree Model is not the right thing to use if you are not dealing with "document-centric" xml (~= xhtml pages, docbook, open office docs).
For data interchange, config files etc data binding is more convenient, more efficient, more natural. Just say no to tree models like DOM for these use cases.
So, JAXB, XStream, JibX are good. Or, for more acquired taste, digester, castor, xmlbeans.
VTD-XML is known for being the best XML processing technology for heavy duty XML processing. See the reference below for a proof
http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf