How to read XML attribute values using JXPathContext if reference is missing - java

Given following XML, we are using JXPathContext to create Java object out of it.
<fb1:Activity fb2:metadata="Activity1">
</fb1:Activity>
<fb21:ActivityMetadata fb2:id="Activity1">
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>
reading the value -
String responseCode = context.getValue("metadata[1]/Response/value");
This is working as expected. Now lets say for instance, the reference from Activity to ActivityMetadata is missing. What can we do to read the response value in such case? It is guaranteed that there can only be one ActivityMetadata element at max in the XML.
Incomplete XML - need to parse this
<fb1:Activity fb2:metadata="">
</fb1:Activity>
<fb21:ActivityMetadata>
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>

The path you're giving us doesn't match the document you're showing us.
Ignoring that for a moment:
XML doesn't constrain the tree at all; that's done by the XML Schema (if there is one) and/or the applications which process that kind of document. Only the folks who defined this particular kind of document, or the schema, or the code can tell you whether there are any guarantees about only one ActivityMetadata being present or what it means if there's more than one.
XML is pure syntax. Meaning is someone else's problem.

Related

Parsing XML for deeply nested data

I have an XML file that is structured something like this:
<element1>
<element2>
<element3>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
<elementIAmInterestedIn attribute="data">
<element4>
<element5>
<element6>
<otherElementIAmInterestedIn>
<data1>text1</data1>
<data2>text2</data2>
<data3>text3</data3>
</otherElementIAmInterestedIn>
</element6>
</element5>
</element4>
</elementIAmInterestedIn>
</element3>
</element2>
</element1>
As you can see, I am interested in two elements, the first of which is deeply nested within the root element, and the second of which is deeply nested within that first element. There are multiple (sibling) elementIAmInterestedIn and otherElementIAmInterestedIn elements in the document.
I want to parse this XML file with Java and put the data from all the elementIAmInterestedIn and otherElementIAmInterestedIn elements into either a data structure or Java objects - it doesn't matter much to me as long as it is organized and I can access it later.
I'm able to write a recursive DOM parser method that does a depth-first traversal of the XML so that it touches every element. I also wrote a Java class with JAXB annotations that represents elementIAmInterestedIn. Then, in the recursive method, I can check when I get to an elementIAmInterestedIn and unmarshal it into an instance of the JAXB class. This works fine except that such an object should also contain multiple otherElementIAmInterestedIn.
This is where I'm stuck. How can I get the data out of otherElementIAmInterestedIn and assign it to the JAXB object? I've seen the #XmlWrapper annotation, but this seems to only work for one layer of nesting. Also, I cannot use #XmlPath.
Maybe I should scratch that idea and use a whole new approach. I'm really just getting started with XML parsing so perhaps I'm overlooking a more obvious solution. How would you parse an XML document structured like this and store the data in an organized way?
Maybe you should use SAX parser instead of DOM. When you use DOM you are loading all the document in memory and in your case you only want to read 2 fields. This is quite inefficient.
Using sax parser you'll be able to read only those nodes that you are interested in. Here is a pseudocode for your task using a SAX parsing model:
1) Keep reading nodes until you get <elementInterestedIn> node
2) Grab that field in your class
3) Keep on reading until you get <otherElementInterestedIn> node
4) Grab that field too and save the object.
Loop from 1 to 4 until it reachs the end of document.
If you try this aproach, i suggest you first reading this document to understand how SAX parser works, it's very different from DOM aproach: How to Use SAX

Get object for element that has failed XSD validation

I'm validating an XML document against an XSD, and then want to delete the nodes that cause the document to fail.
I'm hitting against a problem in that SaxParseException doesn't seem to contain any information about the failure that I can use to programatically remove nodes.
Is there a way to get a reference to the element, that can be used to remove it, from a SaxParseException?
See the answers here: How to get the element of and invalid xml file with failed xsd Validation
Note that what you are proposing to do is unsafe in the general case. For a simple counter-example, take an element X of type integer that must occur at least once in its parent. If you put a string value in it, it will now fail validation. If you remove it, the document will violate the minOccurs constraint.
You could try to remove the element and restart validation from scratch, but you could end up in a very long loop and get no good result.

XPath, Java and serialized xml

Assuming some xml like
<foo>
<bar>test</bar>
</foo>
Evaluating an expression with returnType = String like
/foo/bar
will return "test". However, I'd like to get the serialized xml instead, so something like
<bar>test</bar>
should be returned instead. As I can not check for the returnType in java's xpath implementation (xerces), I cannot simply get an object as result and if it indeed is a node, convert it to serialized xml.
Note: I don't know whether the expression will actually return a node, a string, a number or whatever so I cannot provide an appropriate return type to the eval function except string which, as my problem states, returns the text content and not the serialized xml.
So I am curious -> is there either a java- or (preferred) a xpath-way (function?) to get serialized xml for type string instead of the text children of the selected node?
thanks!
Alex
use the xpath return type XPathConstants.NODE and then you can serialize the returned Node yourself.
Now, you are right to observe that it's difficult to discover the return type of the result; this is a real design weakness of JAXP.
If it's a problem to you, consider using Saxon's s9api interface, which returns XdmValue objects whose type you can interrogate; you also get XPath 2.0 access as a bonus.
As Michael Kay answered, this is difficult in JAXP (the native Java interface).
In Mr Kay's Saxon library's s9api API (see Evaluating XPath Expressions using s9api), once you've called XPathSelector.evaluate() or XPathSelector.evaluateSingle() you can get the XML serialisation by calling XdmValue.toString().
However, if the XPath selected an attribute (e.g. //#name) you will still get the XML serialisation, e.g. name="value". You can call XdmItem.getStringValue(), but for elements that method will return the same values you're already seeing - the textual content of the element, not the serialisation. I've posted separately about how to distinguish between attributes and elements returned from Saxon s9api.

Apache Digester How do I get some xml nested within a tag as a literal string?

I am parsing a XML with Digester. A part of it contains content formatted in cryptic pseudo-HTML XML elements which I need to transform into an PDF. That will be done via Apache FOP. Hence I need to access the xml element which contains the content elements directly and pipe it to FOP. To do so the Digester FAQ states that one either
Wrap the nested xml in CDATA
or
If this can't be done then you need to use a NodeCreateRule to create a DOM node representing the body tag and its children, then serialise that DOM node back to text
Since it is a third party XML the CDATA approach could only be done via (another) XSLT which I hestitate to do.
It looks like this issue should be solvable via NodeCreateRule but I can not figure out how to get it done.
The documentation states that NodeCreateRule will push a Node onto the stack however I can only get it to pass null.
I tried
digester.addRule(docPath + "/contents", new NodeCreateRule());
digester.addCallMethod(docPath + "/contents", "setContentsXML");
setContentsXML expects a Element parameter.
I also tried this and this without any luck.
I am using the latest stable Digester. Would be thankful for any advice.
Update:
I found the bug . The result on my system is null, too. I am using JDK 6u24
The problem in my case as well as the linked bug lays in the proper serialisation of an Element. In my case the mentioned null value was not returned by Digester but by Element#toString(). I assume something changed since JDK 1.4.
By the bug example:
result contains another (text-)node with the actual content. toString() however simply takes the content of the Element instance it is called uppon.
The Element tree has to be serialized explicitly. For example with the serialization method in this useage example of NodeCreateRule.
In case someone else tries to use that with Digester 3: you have to change the method signature SetSerializedNodeRule#end() to SetSerializedNodeRule#end(String, String).

XML and Java... Confused about Values versus Index?

I am trying to understand how to read out XML files using Java. I would like to have one XML tag, lets call it enable, pass a true to a method and another XML tag that provides a number to another method. I would like to pass the true by having the line in my XML file and pass the number as valueofnumber. I am reading out the XML file using a series of if statements testing for certain strings in an XML file:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
{
if (localName.equals("enabled")){
currentConfig.setenabled(true);
}
else if (localName.equals("number")){
currentConfig.setnumber(Double.parseDouble(attributes.getValue("number")))
}
}
I am getting confused as to how extract the value of number from the XML file. Currently I am just getting an error that nothing is present when I try getIndex() as well.
Thanks very much in advance
The getValue() method you're calling takes a qualified name, meaning XML namespace + local name in the format :. Your XML document probably uses a namespace, which you'd have to supply. If there's no namespace, you might need to use the other getValue() method and pass null for the namespace. It all depends a lot on what parser you're using and how it's configured. You'd be better advised to move to a higher-level parsing library that takes care of these nuances for you:
StAX isn't much higher level than SAX, but it still has a friendlier and generally more intuitive interface.
JDOM, being a DOM parser, will be slightly less efficient, but it makes parsing XML incredibly easy.
Commons Digester is kind of a rules-based XML parsing engine. You establish rules for what you want to happen when this or that element or attribute is encountered, and then run the digester. Method calls are one of the rules you can set, as is creation and population of a POJO.
JAXB or XStream will completely remove the guesswork and bind the XML straight to POJOs with minimal configuration. Then you don't even have to deal with XML and can work with normal objects instead.
Edit: (Based on the XML sample) Your "number" isn't an attribute. It's a nested element. That's why you're having trouble getting it from the Attributes object. My other advice on other libraries still stands.

Categories

Resources