Parsing XML for deeply nested data

Parsing XML for deeply nested data - java

I have an XML file that is structured something like this:
<element1>
<element2>
<element3>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
</element3>
</element2>
</element1>
As you can see, I am interested in two elements, the first of which is deeply nested within the root element, and the second of which is deeply nested within that first element. There are multiple (sibling) elementIAmInterestedIn and otherElementIAmInterestedIn elements in the document.
I want to parse this XML file with Java and put the data from all the elementIAmInterestedIn and otherElementIAmInterestedIn elements into either a data structure or Java objects - it doesn't matter much to me as long as it is organized and I can access it later.
I'm able to write a recursive DOM parser method that does a depth-first traversal of the XML so that it touches every element. I also wrote a Java class with JAXB annotations that represents elementIAmInterestedIn. Then, in the recursive method, I can check when I get to an elementIAmInterestedIn and unmarshal it into an instance of the JAXB class. This works fine except that such an object should also contain multiple otherElementIAmInterestedIn.
This is where I'm stuck. How can I get the data out of otherElementIAmInterestedIn and assign it to the JAXB object? I've seen the #XmlWrapper annotation, but this seems to only work for one layer of nesting. Also, I cannot use #XmlPath.
Maybe I should scratch that idea and use a whole new approach. I'm really just getting started with XML parsing so perhaps I'm overlooking a more obvious solution. How would you parse an XML document structured like this and store the data in an organized way?

Maybe you should use SAX parser instead of DOM. When you use DOM you are loading all the document in memory and in your case you only want to read 2 fields. This is quite inefficient.
Using sax parser you'll be able to read only those nodes that you are interested in. Here is a pseudocode for your task using a SAX parsing model:
1) Keep reading nodes until you get <elementInterestedIn> node
2) Grab that field in your class
3) Keep on reading until you get <otherElementInterestedIn> node
4) Grab that field too and save the object.
Loop from 1 to 4 until it reachs the end of document.
If you try this aproach, i suggest you first reading this document to understand how SAX parser works, it's very different from DOM aproach: How to Use SAX

Related

How to read XML attribute values using JXPathContext if reference is missing

Given following XML, we are using JXPathContext to create Java object out of it.
<fb1:Activity fb2:metadata="Activity1">
</fb1:Activity>
<fb21:ActivityMetadata fb2:id="Activity1">
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>
reading the value -
String responseCode = context.getValue("metadata[1]/Response/value");
This is working as expected. Now lets say for instance, the reference from Activity to ActivityMetadata is missing. What can we do to read the response value in such case? It is guaranteed that there can only be one ActivityMetadata element at max in the XML.
Incomplete XML - need to parse this
<fb1:Activity fb2:metadata="">
</fb1:Activity>
<fb21:ActivityMetadata>
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>

The path you're giving us doesn't match the document you're showing us.
Ignoring that for a moment:
XML doesn't constrain the tree at all; that's done by the XML Schema (if there is one) and/or the applications which process that kind of document. Only the folks who defined this particular kind of document, or the schema, or the code can tell you whether there are any guarantees about only one ActivityMetadata being present or what it means if there's more than one.
XML is pure syntax. Meaning is someone else's problem.

Java Sax tree (duplicate attributes)

Im trying to make a program which will build a tree from xml document via SAX parser.
But in result names of tags build sucsessful , but in attributes I see only attribute of last tag.
What is wrong with code?
Print to tree is in tag.toString()

Try to change line 48 into:
Tag t = new Tag(eName, new org.xml.sax.helpers.AttributesImpl(attrs));

I guess the problem is that you are storing the Attributes instance for each Tag, and that the Attributesinstance is reused for each each call to startElement() invokation. Thus, every Tag will will see the same attributes instance with the same content, that of the last one constructed by the parser. You will have to create a copy (or a Map or something else) of the actual attributes for each Tag.

Parsing complex string of JSON in java

I want to parse a JSON string which is quite complex. It has somewhat following format
{ A:{ list of around 20 objects},B:1}
These objects inside A again contains some other objects or the datatypes supported by JSON. I have checked couple of examples and documentations.
I found this example to be helpful
Converting JSON to Java
but looks like I need to know each and every element of the objects contained. I can write a similar code but before spending so much effort I wanted to check if there is other libraries out there which can do automatic parsing and By just giving the key field I can get those contents.

Jackson should be able to handle the structure with no problem. You do not need to know the exact structure of the JSON. You can just iterate over all of the objects and inter-objects.

Apache Digester How do I get some xml nested within a tag as a literal string?

I am parsing a XML with Digester. A part of it contains content formatted in cryptic pseudo-HTML XML elements which I need to transform into an PDF. That will be done via Apache FOP. Hence I need to access the xml element which contains the content elements directly and pipe it to FOP. To do so the Digester FAQ states that one either
Wrap the nested xml in CDATA
or
If this can't be done then you need to use a NodeCreateRule to create a DOM node representing the body tag and its children, then serialise that DOM node back to text
Since it is a third party XML the CDATA approach could only be done via (another) XSLT which I hestitate to do.
It looks like this issue should be solvable via NodeCreateRule but I can not figure out how to get it done.
The documentation states that NodeCreateRule will push a Node onto the stack however I can only get it to pass null.
I tried
digester.addRule(docPath + "/contents", new NodeCreateRule());
digester.addCallMethod(docPath + "/contents", "setContentsXML");
setContentsXML expects a Element parameter.
I also tried this and this without any luck.
I am using the latest stable Digester. Would be thankful for any advice.
Update:
I found the bug . The result on my system is null, too. I am using JDK 6u24

The problem in my case as well as the linked bug lays in the proper serialisation of an Element. In my case the mentioned null value was not returned by Digester but by Element#toString(). I assume something changed since JDK 1.4.
By the bug example:
result contains another (text-)node with the actual content. toString() however simply takes the content of the Element instance it is called uppon.
The Element tree has to be serialized explicitly. For example with the serialization method in this useage example of NodeCreateRule.
In case someone else tries to use that with Digester 3: you have to change the method signature SetSerializedNodeRule#end() to SetSerializedNodeRule#end(String, String).

XML and Java... Confused about Values versus Index?

I am trying to understand how to read out XML files using Java. I would like to have one XML tag, lets call it enable, pass a true to a method and another XML tag that provides a number to another method. I would like to pass the true by having the line in my XML file and pass the number as valueofnumber. I am reading out the XML file using a series of if statements testing for certain strings in an XML file:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
{
if (localName.equals("enabled")){
currentConfig.setenabled(true);
}
else if (localName.equals("number")){
currentConfig.setnumber(Double.parseDouble(attributes.getValue("number")))
}
}
I am getting confused as to how extract the value of number from the XML file. Currently I am just getting an error that nothing is present when I try getIndex() as well.
Thanks very much in advance

The getValue() method you're calling takes a qualified name, meaning XML namespace + local name in the format :. Your XML document probably uses a namespace, which you'd have to supply. If there's no namespace, you might need to use the other getValue() method and pass null for the namespace. It all depends a lot on what parser you're using and how it's configured. You'd be better advised to move to a higher-level parsing library that takes care of these nuances for you:
StAX isn't much higher level than SAX, but it still has a friendlier and generally more intuitive interface.
JDOM, being a DOM parser, will be slightly less efficient, but it makes parsing XML incredibly easy.
Commons Digester is kind of a rules-based XML parsing engine. You establish rules for what you want to happen when this or that element or attribute is encountered, and then run the digester. Method calls are one of the rules you can set, as is creation and population of a POJO.
JAXB or XStream will completely remove the guesswork and bind the XML straight to POJOs with minimal configuration. Then you don't even have to deal with XML and can work with normal objects instead.
Edit: (Based on the XML sample) Your "number" isn't an attribute. It's a nested element. That's why you're having trouble getting it from the Attributes object. My other advice on other libraries still stands.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing XML for deeply nested data - java

Related

How to read XML attribute values using JXPathContext if reference is missing

Java Sax tree (duplicate attributes)

Parsing complex string of JSON in java

Apache Digester How do I get some xml nested within a tag as a literal string?

XML and Java... Confused about Values versus Index?

Categories

Resources