Get object for element that has failed XSD validation - java

I'm validating an XML document against an XSD, and then want to delete the nodes that cause the document to fail.
I'm hitting against a problem in that SaxParseException doesn't seem to contain any information about the failure that I can use to programatically remove nodes.
Is there a way to get a reference to the element, that can be used to remove it, from a SaxParseException?

See the answers here: How to get the element of and invalid xml file with failed xsd Validation
Note that what you are proposing to do is unsafe in the general case. For a simple counter-example, take an element X of type integer that must occur at least once in its parent. If you put a string value in it, it will now fail validation. If you remove it, the document will violate the minOccurs constraint.
You could try to remove the element and restart validation from scratch, but you could end up in a very long loop and get no good result.

Related

Jackson Deserialize XML when an element contains HTML tags

I am trying to deserialize an XML response using Jackson and it contains an element with string value as below:
<Abstract>
<AbstractText>My Text <sup>1</sup> this. <sup>2,3</sup> and that.</AbstractText>
<Abstract>
I actually don't even care for this element and it would be great to IGNORE this element "Abstract" or "AbstractText" completely. But I can't find a way to ignore it.
I have two POJOs to handle this XML when it is deserialized by Jackson, and I was hoping I could either trick Jackson somehow or override it manually. But I am getting this error:
Caused by: java.io.IOException: Expected END_ELEMENT, got event of type 1
I've tried to look for the solution but nowhere it is solved or have any workarounds. Can someone please help?

XPath to check the namespace a prefix is bound to

Say I have the following XML file:
<a xmlns:foo="http://foo"></a>
I need to check whether the prefix foo is bound to http://foo or not. Whereby not bound could indicate that the said prefix does not exist at all or is bound to some other namespace URI.
I already have a library that takes a Document object and an XPath expression and returns a (possibly empty) List of Nodes that exist at that XPath.
So what would be an expression that would check for the presence of a prefix foo in the top-most element (document element) bound to the namespace http://foo and that would yield one node for the above XML and zero nodes for the following XMLs:
<a xmlns:fooX="http://foo"></a>
and
< xmlns:foo="http://fooX"></a>
I tried, as a first step, to just get the value of that attribute using:
/*[#*[local-name()='foo']]
... but it seems that prefix-binding attributes are handled differently from "normal" attributes.
If you want to do it with XPath then you have to use the namespace axis: /*[namespace::foo[. = 'http://foo']]. DOM Level 3 might provide different ways treating the namespace declarations as attributes and resolving prefixes, see http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-lookupNamespaceURI.

How to read XML attribute values using JXPathContext if reference is missing

Given following XML, we are using JXPathContext to create Java object out of it.
<fb1:Activity fb2:metadata="Activity1">
</fb1:Activity>
<fb21:ActivityMetadata fb2:id="Activity1">
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>
reading the value -
String responseCode = context.getValue("metadata[1]/Response/value");
This is working as expected. Now lets say for instance, the reference from Activity to ActivityMetadata is missing. What can we do to read the response value in such case? It is guaranteed that there can only be one ActivityMetadata element at max in the XML.
Incomplete XML - need to parse this
<fb1:Activity fb2:metadata="">
</fb1:Activity>
<fb21:ActivityMetadata>
<fb1:Response>XXXX</fb1:Response>
</fb1:ActivityMetadata>
The path you're giving us doesn't match the document you're showing us.
Ignoring that for a moment:
XML doesn't constrain the tree at all; that's done by the XML Schema (if there is one) and/or the applications which process that kind of document. Only the folks who defined this particular kind of document, or the schema, or the code can tell you whether there are any guarantees about only one ActivityMetadata being present or what it means if there's more than one.
XML is pure syntax. Meaning is someone else's problem.

XPath, Java and serialized xml

Assuming some xml like
<foo>
<bar>test</bar>
</foo>
Evaluating an expression with returnType = String like
/foo/bar
will return "test". However, I'd like to get the serialized xml instead, so something like
<bar>test</bar>
should be returned instead. As I can not check for the returnType in java's xpath implementation (xerces), I cannot simply get an object as result and if it indeed is a node, convert it to serialized xml.
Note: I don't know whether the expression will actually return a node, a string, a number or whatever so I cannot provide an appropriate return type to the eval function except string which, as my problem states, returns the text content and not the serialized xml.
So I am curious -> is there either a java- or (preferred) a xpath-way (function?) to get serialized xml for type string instead of the text children of the selected node?
thanks!
Alex
use the xpath return type XPathConstants.NODE and then you can serialize the returned Node yourself.
Now, you are right to observe that it's difficult to discover the return type of the result; this is a real design weakness of JAXP.
If it's a problem to you, consider using Saxon's s9api interface, which returns XdmValue objects whose type you can interrogate; you also get XPath 2.0 access as a bonus.
As Michael Kay answered, this is difficult in JAXP (the native Java interface).
In Mr Kay's Saxon library's s9api API (see Evaluating XPath Expressions using s9api), once you've called XPathSelector.evaluate() or XPathSelector.evaluateSingle() you can get the XML serialisation by calling XdmValue.toString().
However, if the XPath selected an attribute (e.g. //#name) you will still get the XML serialisation, e.g. name="value". You can call XdmItem.getStringValue(), but for elements that method will return the same values you're already seeing - the textual content of the element, not the serialisation. I've posted separately about how to distinguish between attributes and elements returned from Saxon s9api.

Apache Digester How do I get some xml nested within a tag as a literal string?

I am parsing a XML with Digester. A part of it contains content formatted in cryptic pseudo-HTML XML elements which I need to transform into an PDF. That will be done via Apache FOP. Hence I need to access the xml element which contains the content elements directly and pipe it to FOP. To do so the Digester FAQ states that one either
Wrap the nested xml in CDATA
or
If this can't be done then you need to use a NodeCreateRule to create a DOM node representing the body tag and its children, then serialise that DOM node back to text
Since it is a third party XML the CDATA approach could only be done via (another) XSLT which I hestitate to do.
It looks like this issue should be solvable via NodeCreateRule but I can not figure out how to get it done.
The documentation states that NodeCreateRule will push a Node onto the stack however I can only get it to pass null.
I tried
digester.addRule(docPath + "/contents", new NodeCreateRule());
digester.addCallMethod(docPath + "/contents", "setContentsXML");
setContentsXML expects a Element parameter.
I also tried this and this without any luck.
I am using the latest stable Digester. Would be thankful for any advice.
Update:
I found the bug . The result on my system is null, too. I am using JDK 6u24
The problem in my case as well as the linked bug lays in the proper serialisation of an Element. In my case the mentioned null value was not returned by Digester but by Element#toString(). I assume something changed since JDK 1.4.
By the bug example:
result contains another (text-)node with the actual content. toString() however simply takes the content of the Element instance it is called uppon.
The Element tree has to be serialized explicitly. For example with the serialization method in this useage example of NodeCreateRule.
In case someone else tries to use that with Digester 3: you have to change the method signature SetSerializedNodeRule#end() to SetSerializedNodeRule#end(String, String).

Categories

Resources