Groovy: how to parse xml and preserve namespaces and schemaLocations - java

I'm trying to use groovy to simply add a node to a at a particular location. My source schema looks like this
<s1:RootNode
xmlns:s1="http://localhost/s1schema"
xmlns:s2="http://localhost/s2schema"
xsi:schemaLocation="http://localhost/s1schema s1schema.xsd
http://localhost/s2schema s2schema.xsd">
<s1:aParentNode>
<s2:targetNode>
<s2:childnode1 />
<s2:childnode2 />
<s2:childnode3 />
<s2:childnode4 />
</s2:targetNode>
</s1:aParentNode>
</s1:RootNode>
I'd like to simply add a new child node inline with the other ones to make the output
<s1:RootNode
xmlns:s1="http://localhost/s1schema"
xmlns:s2="http://localhost/s2schema"
xsi:schemaLocation="http://localhost/s1schema s1schema.xsd
http://localhost/s2schema s2schema.xsd">
<s1:aParentNode>
<s2:targetNode>
<s2:childnode1 />
<s2:childnode2 />
<s2:childnode3 />
<s2:childnode4 />
<s2:childnode5 >value</s2:childnode5>
</s2:targetNode>
</s1:aParentNode>
</s1:RootNode>
To do this i have the following simple groovy script
def data = 'value'
def root = new XmlSlurper(false,true).parseText( sourceXML )
root.'aParentNode'.'topNode'.appendNode{
's2:childnode5' data
}
groovy.xml.XmlUtil.serialize(root);
however when i do this the namespaces and schemaLocations that are applied to the root node are being removed. and the namespace, but not the schema location is being added to each of the child nodes.
this is causing validation issues downstream.
How do i simply process this xml. perform no validation and leave the xml as is and add a single node of a namespace i specify?
One note: we process many messages and i won't know in advance the outer most namespace (s1 in the above example) but even with that, i'm really just lookign for a technique that is a "dumber" processing of xml
Thanks!

First, I had to add xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" to define your xsi namespace. Without it I would receive a SAXParseException for the unbound xsi prefix.
Additionally, I consulted this question on successfully appending a namespaced xml node to an existing document.
Finally, we had to utilize the StreamingMarkupBuilder to work around the moving of the namespaces. Bascially, by default the serializer moves the referenced namespaces to the first node that actually uses the namespace. In your case it was moving your s2 namespace attribute to the "targetNode" tag. The following code produces the results you want, but you will still have to know the correct namespaces to use to instantiate the StreamingMarkupBuilder.
def root = new XmlSlurper(false, true).parseText( sourceXML )
def data = '<s2:childnode5 xmlns:s2="http://localhost/s2schema">value</s2:childnode5>'
def xmlFragment = new XmlSlurper(false, true).parseText(data)
root.'aParentNode'.'targetNode'.appendNode(xmlFragment);
def outputBuilder = new StreamingMarkupBuilder()
String result = XmlUtil.serialize(outputBuilder.bind {
mkp.declareNamespace('s1':"http://localhost/s1schema")
mkp.declareNamespace('s2':"http://localhost/s2schema")
mkp.yield root }
)

XMLSlurper (or XMLParser) does not handle namespaces if you set the second parameter of the constructor:
XmlSlurper (boolean validating, boolean namespaceAware)
to false:
def root = new XmlSlurper(false, false).parseText( sourceXML )
Without setting namespaceAware to false, I also faced strange bahavior of the parser. After setting to false, it leaves the XML as is, with no namespace changes.

Related

XpathException in Java MAVEN example Expression uses unbound namespace prefix

After researching on google I have not find a working solution for this.
The 'MAVEN by Example' ebook uses the Yahoo weather example. Unfortunately it looks like Yahoo changed their interface. I tried to adapt the java code for this, but get this annoying exception:
exec-maven-plugin:1.5.0:java
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java
Caused by: org.dom4j.XPathException:
Exception occurred evaluting XPath: /query/results/channel/yweather:location/#city.
Exception: XPath expression uses unbound namespace prefix yweather
The xml line itself is:
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2017-02-13T10:57:34Z" yahoo:lang="en-US">
<results>
<channel>
...
<yweather:location xmlns:yweather="http://xml.weather.yahoo.com/ns/rss/1.0" city="Theale" country="United Kingdom" region=" England"/>
The entire XML can be generated from :
https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%3D91731537
My code (as per the 'MAVEN By Example' ebook, xpath and url modified for the changed Yahoo):
public Weather parse(InputStream inputStream) throws Exception {
Weather weather = new Weather();
SAXReader xmlReader = createXmlReader();
Document doc = xmlReader.read( inputStream );
weather.setCity(doc.valueOf ("//yweather:location/#city") );
// and several more, such as setCountry, setTemp
}
(I'm not an xpath expert, so I tried
/query/results/channel/item/yweather:location/#city
as well, just in case, with the same result.
xmlReader:
public InputStream retrieve(String woeid) throws Exception {
String url = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%3D"+woeid; // eg 91731537
URLConnection conn = new URL(url).openConnection();
return conn.getInputStream();
}
and the weather class is just a set of getters and setters
When I try this in this XML tester, it works just fine, but that may be the effect of XPATH-v2 vs Java's v1.
When you evaluate your XPath //yweather:location/#city, the XPath processor has no knowledge of which namespace the yweather prefix is bound to. You'll need to provide that information. Now, you might think "the info is right there in the document!" and you'd be right. But prefixes are just a sort of stand-in (like a variable) for the actual namespace. A namespace can be bound to any prefix you like that follows the prefix naming rules, and can be bound to multiple prefixes as well. Just like the variable name in Java referring to an object is of itself of no importance, and multiple variables could refer to the same object.
For example, if you used XPath //yw:location/#city with the prefix yw bound to namespace http://xml.weather.yahoo.com/ns/rss/1.0, it'd still work the same.
I suggest you use class org.dom4j.xpath.DefaultXPath instead of calling valueOf. Create an instance of it and initialize the namespace context. There's a method setNamespaceURIs that takes a Map from prefixes to namespaces and lets you make the bindings. Bind the above weather namespace (the actual URI) to some prefix of your choosing (may be yweather, but can be anything else you want to use in your actual XPath expression) and then use the instance to evaluate it over the document.
Here's an answer I gave to some question that goes more in-depth about what namespaces and their prefixes really are: https://stackoverflow.com/a/8231272/630136
EDIT: the online XPath tester you used probably does some behind-the-scenes magic to extract the namespaces and their prefixes from the given document and bind those in the XPath processor.
If you look at their sample XML and adjust it like this...
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers xmlns:test="http://www.foo.org/">
<test:singer id="4">Tom Waits</test:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
</root>
the XML is semantically equivalent, because the test prefix is bound to the same namespace as foo. The XPath //foo:singer/#id still returns all the right results, so the tool is smart about it. However, it doesn't know what to do with XML...
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<foo:actor id="1">Christian Bale</foo:actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers xmlns:test="http://www.foo.org/" xmlns:foo="http://www.bar.org">
<test:singer id="4">Tom Waits</test:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
</root>
and XPath //foo:*/#id. The prefix foo is bound to a different namespace in the singers element scope, and now it only returns the ids 5 and 6. Contrast it with this XPath, that doesn't use a prefix but the namespace-uri() function: //*[namespace-uri()='http://www.foo.org/']/#id
That last one returns ids 1 and 4, as expected.
I found the error, it's my unfamiliarity with namespaces. The 'createXmlReader()'
used in my example above is a method that sets the correct namespace, except that I forgot to change it after Yahoo changed the xml. Careful re-reading the Maven-by-example documentation, the generated error, and comparing with the detailed answer given here, it suddenly clicked. The updated code (for the benefit of anyone trying the same example):
private SAXReader createXmlReader() {
Map<String,String> uris = new HashMap<String,String>();
uris.put( "yweather", "http://xml.weather.yahoo.com/ns/rss/1.0" );
DocumentFactory factory = new DocumentFactory();
factory.setXPathNamespaceURIs( uris );
SAXReader xmlReader = new SAXReader();
xmlReader.setDocumentFactory( factory );
return xmlReader;
}
The only change is in the line 'uris.put()'
Originally the namespace was "y", now it is "yweather".

adding mutiple namespace for a element in xml file using dom4j

The XML file which need to generate:
<?xml version="1.0" encoding="UTF-8" ?>
<wrapper:MMSRMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:wrapper="urn:iso:std:iso:20022:tech:xsd:head.003.001.01" xsi:schemaLocation="urn:iso:std:iso:20022:tech:xsd:head.003.001.01 MMSR_head.003.001.01_Wrapper.xsd">
<header:AppHdr xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:header="urn:iso:std:iso:20022:tech:xsd:head.001.001.01">
<header:Fr>
...
</header:Fr>
</header:AppHdr>
</wrapper:MMSRMessage>
Two namespaces were added for the root element "wrapper:MMSRMessage",It has no problem.
The following is the Java code for it:
Document document = DocumentHelper.createDocument();
Element wrapper = document.addElement("wrapper:MMSRMessage");
wrapper.addNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance")
.addNamespace("wrapper", "urn:iso:std:iso:20022:tech:xsd:head.003.001.01")
.addAttribute("xsi:schemaLocation", "urn:iso:std:iso:20022:tech:xsd:head.003.001.01 MMSR_head.003.001.01_Wrapper.xsd");
However, when I add two namespaces for element "header:AppHdr", I get the error message:
Exception in thread "main" org.dom4j.IllegalAddException: No such namespace prefix
using java code:
Element headerApp = wrapper.addElement("header:AppHdr");
headerApp.addNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance")
.addNamespace("header", "urn:iso:std:iso:20022:tech:xsd:head.001.001.01");
I also have tried so:
Element headerApp = wrapper.addElement("header:AppHdr","urn:iso:std:iso:20022:tech:xsd:head.001.001.01")
.addNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
in this way the error does not occur, but the namespace "xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" can not be added for the element "header:AppHdr".
That's my first question at Stackoverflow. I hope I can get an answer hier :-)
DOM4J generally offers too many ways to skin the cat. In this area, the confusion is increased because Document#addElement(String, String) and Element#addElement(String, String) do very different validations: In the first case you can add an element with a qualified name without having the prefix bound to a namespace and the element ends up having no namespace (this is a bug). In the second case you must have the prefix bound (correct).
All in all, I reccomend not using qualified element and attribute names (prefix:local-name) if you can avoid it. Instead, strictly separate the local name of the element or attribute and use properly declared Namespace and QName constructs. In your case:
Document document = DocumentHelper.createDocument();
Namespace xsi = Namespace.get("xsi", "http://www.w3.org/2001/XMLSchema-instance");
Namespace wrapper = Namespace.get("wrapper", "urn:iso:std:iso:20022:tech:xsd:head.003.001.01");
Namespace header = Namespace.get("header", "urn:iso:std:iso:20022:tech:xsd:head.001.001.01");
Element wrapperElement = document
.addElement(new QName("MMSRMessage", wrapper))
.addAttribute(new QName("schemaLocation", xsi), "urn:iso:std:iso:20022:tech:xsd:head.003.001.01 MMSR_head.003.001.01_Wrapper.xsd");
Element headerApp = wrapperElement.addElement(new QName("AppHdr", header));
headerApp.addElement(new QName("Fr", header));

Retrieve value of attribute using XPath

I am trying to retrieve the value of an attribute from an xmel file using XPath and I am not sure where I am going wrong..
This is the XML File
<soapenv:Envelope>
<soapenv:Header>
<common:TestInfo testID="PI1" />
</soapenv:Header>
</soapenv:Envelope>
And this is the code I am using to get the value. Both of these return nothing..
XPathBuilder getTestID = new XPathBuilder("local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']/*[local-name(.)='TestInfo'])");
XPathBuilder getTestID2 = new XPathBuilder("Envelope/Header/TestInfo/#testID");
Object doc2 = getTestID.evaluate(context, sourceXML);
Object doc3 = getTestID2.evaluate(context, sourceXML);
How can I retrieve the value of testID?
However you're iterating within the java, your context node is probably not what you think, so remove the "." specifier in your local-name(.) like so:
/*[local-name()='Header']/*[local-name()='TestInfo']/#testID worked fine for me with your XML, although as akaIDIOT says, there isn't an <Envelope> tag to be seen.
The XML file you provided does not contain an <Envelope> element, so an expression that requires it will never match.
Post-edit edit
As can be seen from your XML snippet, the document uses a specific namespace for the elements you're trying to match. An XPath engine is namespace-aware, meaning you'll have to ask it exactly what you need. And, keep in mind that a namespace is defined by its uri, not by its abbreviation (so, /namespace:element doesn't do much unless you let the XPath engine know what the namespace namespace refers to).
Your first XPath has an extra local-name() wrapped around the whole thing:
local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']
/*[local-name(.)='TestInfo'])
The result of this XPath will either be the string value "TestInfo" if the TestInfo node is found, or a blank string if it is not.
If your XML is structured like you say it is, then this should work:
/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='TestInfo']/#testID
But preferably, you should be working with namespaces properly instead of (ab)using local-name(). I have a post here that shows how to do this in Java.
If you don't care for the namespaces and use an XPath 2.0 compatible engine, use * for it.
//*:Header/*:TestInfo/#testID
will return the desired input.
It will probably be more elegant to register the needed namespaces (not covered here, depends on your XPath engine) and query using these:
//soapenv:Header/common:TestInfo/#testID

Reducing code redundancy while creating XML with XOM

I am using XOM as my XML parsing library. And i am using this for creating XML also. Below is the scenario described with example.
Scenario:
Code:
Element root = new Element("atom:entry", "http://www.w3c.org/Atom");
Element city = new Element("info:city", "http://www.myinfo.com/Info");
city.appendChild("My City");
root.appendChild(city);
Document d = new Document(root);
System.out.println(d.toXML());
Generated XML:
<?xml version="1.0"?>
<atom:entry xmlns:atom="http://www.w3c.org/Atom">
<info:city xmlns:info="http://www.myinfo.com/Info">
My City
</info:city>
</atom:entry>
Notice in the XML that here info namespace is added with the node itself. But I need this to be added in root element. like below
<?xml version="1.0"?>
<atom:entry xmlns:atom="http://www.w3c.org/Atom" xmlns:info="http://www.myinfo.com/Info">
<info:city>
My City
</info:city>
</atom:entry>
And to do that, i just need following piece of code
Element root = new Element("atom:entry", "http://www.w3c.org/Atom");
=> root.addNamespaceDeclaration("info", "http://www.myinfo.com/Info");
Element city = new Element("info:city", "http://www.myinfo.com/Info");
... ... ...
Problem is here i had to add http://www.myinfo.com/Info twice. And in my case there are hundreds of namespaces. So there will so too much redendancy. Is there any way to get rid of this redundancy?
No, there is no way to get rid of this redundancy and that's a deliberate decision. In XOM the namespace is a fundamental part of the element itself, not a function of its position in the document.
Of course you could always declare a named constant for the namespace URI.

Element XMLFileDefinition is undefined in a Java object of type class coldfusion.xml.XmlNodeList

Here is a simplified example which creates an identical error to the one that I am trying to fix.
<cfscript>
private xml function getBaseRequest() {
// Set up the root xml element
var xmlReturn = XmlNew(true);
xmlReturn.xmlRoot = xmlElemNew(xmlReturn,'testbase');
// Attach a child with generic name
ArrayAppend(xmlReturn['testbase'].XmlChildren,xmlElemNew(xmlReturn,'thisworks'));
// Add a child to that
ArrayAppend(xmlReturn['testbase']['thisworks'].XmlChildren,xmlElemNew(xmlReturn,'attachme'));
// Now attach a child with node name 'XMLFileDefinition'
ArrayAppend(xmlReturn['testbase'].XmlChildren,xmlElemNew(xmlReturn,'XMLFileDefinition'));
// And attempt to add a child to that
// produces error "Element XMLFileDefinition is undefined in a Java object of type class coldfusion.xml.XmlNodeList"
ArrayAppend(xmlReturn['testbase']['XMLFileDefinition'].XmlChildren,xmlElemNew(xmlReturn,'thisbreaks'));
return xmlReturn;
}
</cfscript>
To clarify, the XML when dumped out just before the line marked as erroring is as follows
<?xml version="1.0" encoding="UTF-8"?>
<testbase>
<thisworks>
<attachme/>
</thisworks>
<XMLFileDefinition/>
</testbase>
Unfortunately this XML is required according to a schema provided by a third party and as such we cannot have the nodes renamed to something that works more nicely with ColdFusion.
UPDATE this seems to be related to the "XML" prefix on the node name. Any node that prefixes with "XML" seems to cause this problem. I have added my answer detailing how I got around this however hoping someone can come up with a more elegant/universal solution as mine has some potential pitfalls.
Versions of CF/Java as follows :
ColdFusion version : 9,0,1,274733
Java version : 14.3-b01
I have also tried updating and this still occurs on the following versions
ColdFusion version : 9,0,1,274733 (Cumulative Hotfix 2)
Java version : 1.7.0_03
Following further investigation I determined that this issue is caused by attempting to access an XML node whose name matches the REGEX 'xml.*' (i.e. starts with the letters xml, in any case.) I was unable to identify a specific explanation though I would assume this relates to the Xml.... keys that are used to refer to XmlAttributes/XmlChildren/XmlText etc
The method I eventually used to fix this are as follows (truncated example)
var xmlReturn = XmlNew(true);
var nodeBody = xmlElemNew(xmlReturn,'RootElement');
xmlReturn.xmlRoot = nodeBody;
var nodeXMLDefinition = xmlElemNew(xmlReturn,'XMLFileDefinition');
ArrayAppend(nodeXMLDefinition.XmlChildren, xmlElemNew(xmlReturn,'SomeChildElement'));
ArrayAppend(nodeBody.XmlChildren,nodeXMLDefinition);

Categories

Resources