I'll point out now, that I'm new to using saxon, and I've tried following the docs and examples in the package, but I'm just not having luck with this problem.
Basically, I'm trying to do some xml processing in java using saxon v8. In order to get something working, I took one of the sample files included in the package and modified to my needs. It works so long as I'm not using namespaces, and that is my question. How can I get around the namespace problem? I don't really care to use it, but it exists in my xml, so I either have to use it or ignore it. Either solution is fine.
Anyway, here is my starter code. It doesn't do anything but take an xpath query try to use it against the hard coded xml doc.
public static void main(String[] args) {
String query = args[0];
File XMLStream=null;
String xmlFileName="doc.xml";
OutputStream destStream=System.out;
XQueryExpression exp=null;
Configuration C=new Configuration();
C.setSchemaValidation(false);
C.setValidation(false);
StaticQueryContext SQC=new StaticQueryContext(C);
DynamicQueryContext DQC=new DynamicQueryContext(C);
QueryProcessor processor = new QueryProcessor(SQC);
Properties props=new Properties();
try{
exp=processor.compileQuery(query);
XMLStream=new File(xmlFileName);
InputSource XMLSource=new InputSource(XMLStream.toURI().toString());
SAXSource SAXs=new SAXSource(XMLSource);
DocumentInfo DI=SQC.buildDocument(SAXs);
DQC.setContextNode(DI);
SequenceIterator iter = exp.iterator(DQC);
while(true){
Item i = iter.next();
if(i != null){
System.out.println(i.getStringValue());
}
else break;
}
}
catch (Exception e){
System.err.println(e.getMessage());
}
}
An example XML file is here...
<?xml version="1.0"?>
<ns1:animal xmlns:ns1="http://my.catservice.org/">
<cat>
<catId>8889</catId>
<fedStatus>true</fedStatus>
</cat>
</ns1:animal>
If I run this with a query including the namespace, I get an error. For example:
/ns1:animal/cat/ gives the error: "Prefix ns1 has not been declared".
If I remove the ns1: from the query, it gives me nothing. If I doctor the xml to remove the "ns1:" prepended to "animal" I can run the query /animal/cat/ with success.
Any help would be greatly appreciated. Thanks.
Error message correctly points out that your xpath expression does not indicate what namespace prefix "ns1" means (binds to). Just because document to operate on happens to use binding for "ns1" does not mean it is what should be used: this because in XML, it's the namespace URI that matters, and prefixes are just convenient shortcuts to the real thing.
So: how do you define the binding? There are 2 generic ways; either provide a context that can resolve the prefix, or embed actual URI within XPath expression.
Regarding the first approach, this email from Saxon author mentions JAXP method XPath.setNamespaceContext(), similarly, Jaxen XPath processor FAQ has some sample code that could help
That's not very convenient, as you have to implement NamespaceContext, but once you have an implementation you'll be set.
So the notation approach... let's see: Top Ten Tips to Using XPath and XPointer shows this example:
to match element declared with namespace like:
xmlns:book="http://my.example.org/namespaces/book"
you use XPath name like:
{http://my.example.org/namespaces/book}section
which hopefully is understood by Saxon (or Jaxen).
Finally, I would recommend upgrading to Saxon9 if possible, if you have any trouble using one of above solutions.
If you want to have something working out of the box, you can check out embedding-xquery-in-java. There's github project, which uses Saxon to evaluate some sample XQuery expressions.
Regards
Related
After researching on google I have not find a working solution for this.
The 'MAVEN by Example' ebook uses the Yahoo weather example. Unfortunately it looks like Yahoo changed their interface. I tried to adapt the java code for this, but get this annoying exception:
exec-maven-plugin:1.5.0:java
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java
Caused by: org.dom4j.XPathException:
Exception occurred evaluting XPath: /query/results/channel/yweather:location/#city.
Exception: XPath expression uses unbound namespace prefix yweather
The xml line itself is:
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2017-02-13T10:57:34Z" yahoo:lang="en-US">
<results>
<channel>
...
<yweather:location xmlns:yweather="http://xml.weather.yahoo.com/ns/rss/1.0" city="Theale" country="United Kingdom" region=" England"/>
The entire XML can be generated from :
https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%3D91731537
My code (as per the 'MAVEN By Example' ebook, xpath and url modified for the changed Yahoo):
public Weather parse(InputStream inputStream) throws Exception {
Weather weather = new Weather();
SAXReader xmlReader = createXmlReader();
Document doc = xmlReader.read( inputStream );
weather.setCity(doc.valueOf ("//yweather:location/#city") );
// and several more, such as setCountry, setTemp
}
(I'm not an xpath expert, so I tried
/query/results/channel/item/yweather:location/#city
as well, just in case, with the same result.
xmlReader:
public InputStream retrieve(String woeid) throws Exception {
String url = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%3D"+woeid; // eg 91731537
URLConnection conn = new URL(url).openConnection();
return conn.getInputStream();
}
and the weather class is just a set of getters and setters
When I try this in this XML tester, it works just fine, but that may be the effect of XPATH-v2 vs Java's v1.
When you evaluate your XPath //yweather:location/#city, the XPath processor has no knowledge of which namespace the yweather prefix is bound to. You'll need to provide that information. Now, you might think "the info is right there in the document!" and you'd be right. But prefixes are just a sort of stand-in (like a variable) for the actual namespace. A namespace can be bound to any prefix you like that follows the prefix naming rules, and can be bound to multiple prefixes as well. Just like the variable name in Java referring to an object is of itself of no importance, and multiple variables could refer to the same object.
For example, if you used XPath //yw:location/#city with the prefix yw bound to namespace http://xml.weather.yahoo.com/ns/rss/1.0, it'd still work the same.
I suggest you use class org.dom4j.xpath.DefaultXPath instead of calling valueOf. Create an instance of it and initialize the namespace context. There's a method setNamespaceURIs that takes a Map from prefixes to namespaces and lets you make the bindings. Bind the above weather namespace (the actual URI) to some prefix of your choosing (may be yweather, but can be anything else you want to use in your actual XPath expression) and then use the instance to evaluate it over the document.
Here's an answer I gave to some question that goes more in-depth about what namespaces and their prefixes really are: https://stackoverflow.com/a/8231272/630136
EDIT: the online XPath tester you used probably does some behind-the-scenes magic to extract the namespaces and their prefixes from the given document and bind those in the XPath processor.
If you look at their sample XML and adjust it like this...
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers xmlns:test="http://www.foo.org/">
<test:singer id="4">Tom Waits</test:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
</root>
the XML is semantically equivalent, because the test prefix is bound to the same namespace as foo. The XPath //foo:singer/#id still returns all the right results, so the tool is smart about it. However, it doesn't know what to do with XML...
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<foo:actor id="1">Christian Bale</foo:actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers xmlns:test="http://www.foo.org/" xmlns:foo="http://www.bar.org">
<test:singer id="4">Tom Waits</test:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
</root>
and XPath //foo:*/#id. The prefix foo is bound to a different namespace in the singers element scope, and now it only returns the ids 5 and 6. Contrast it with this XPath, that doesn't use a prefix but the namespace-uri() function: //*[namespace-uri()='http://www.foo.org/']/#id
That last one returns ids 1 and 4, as expected.
I found the error, it's my unfamiliarity with namespaces. The 'createXmlReader()'
used in my example above is a method that sets the correct namespace, except that I forgot to change it after Yahoo changed the xml. Careful re-reading the Maven-by-example documentation, the generated error, and comparing with the detailed answer given here, it suddenly clicked. The updated code (for the benefit of anyone trying the same example):
private SAXReader createXmlReader() {
Map<String,String> uris = new HashMap<String,String>();
uris.put( "yweather", "http://xml.weather.yahoo.com/ns/rss/1.0" );
DocumentFactory factory = new DocumentFactory();
factory.setXPathNamespaceURIs( uris );
SAXReader xmlReader = new SAXReader();
xmlReader.setDocumentFactory( factory );
return xmlReader;
}
The only change is in the line 'uris.put()'
Originally the namespace was "y", now it is "yweather".
I should have originally posted my question by stating that our code was using an embedded saxon extension function - saxon:parse($xml) which returned the root element/node of the xml. However, in Saxon-HE that extension is no longer available - so I am trying to write an Integrated extension that parses an xml string into a document and returns the root element.
I am using Saxon-HE 9.5.1.6 - I am trying to write an Integrated Extension Function that returns the ROOT Node of a Document. The function receives an xml string - creates a document and needs to return the root node to the xslt for it to then use xpath to find a specific element. The call() method of the ExtensionFunctionCall class/types return a Sequence type - how do I return a NodeSequence or NodeType? How do I construct the NodeSequence from my Document?
I can step and debug and confirm the function receives the correct xml - parses this into a document, but so far I am unable to determine how to construct the NodeSequence with my RootElement.
I have other Integrated Extension Functions that return a StringValue - and those work great, but I can't glean from the class methods available how to return anything other than simple (numerica/alpha/item) types from the ExtensionFunctionCall
Thank you.
The class DocumentInfo implements Sequence, so if you return a DocumentInfo, that will satisfy the interface. You can construct a DocumentInfo using
context.getConfiguration().buildDocument()
If you want to construct your document using some external object model such as DOM or JDOM2, you will need to take the root node of that external document and wrap it in the appropriate kind of Saxon DocumentWrapper to make it into a DocumentInfo.
For anyone reading - following this I was able to get this working with Michael Kay's help - my solution is the following:
Source source = new StreamSource(new StringReader(myXMLparam));
DocumentInfo docInfo = context.getConfiguration().buildDocument(source);
return docInfo;
I have an issue trying to compare 2 XML documents in Java, using oracle.xml.differ.XMLDiff. The code is fully implemented and I expected it to be working fine, until I discovered an attribute change is not picked up in some instances. To demonstrate this, I have the following:
Setup:
DOMParser parser = new DOMParser();
parser.setPreserveWhitespace(false);
parser.parse(isCurrent);
XMLDocument currentXmlDoc = parser.getDocument();
parser.parse(isPrior);
XMLDocument priorXmlDoc = parser.getDocument();
XMLDiff xmlDiff = new XMLDiff();
xmlDiff.setDocuments(currentXmlDoc, priorXmlDoc);
In the first case, the attribute change in Strike is picked up fine. I have the following 2 XML files:
XML1
<Periods>
<Period Start="2011-03-28" End="2011-04-17" AverageStart="" AverageEnd="" Notional="6000000.0000" OptionType="Swap" Payment="2011-04-19" Strike="72.0934800" Underlying="ZA" ResetStrike="No" ResetNotional="No" QuotingDate="2011-04-17" Multiplier="1.000000" PlusConstant="0.000000" StopLossPercent="" StopLossLevel=""/>
</Periods>
XML2
<Periods>
<Period Start="2011-03-28" End="2011-04-17" AverageStart="" AverageEnd="" Notional="6000000.0000" OptionType="Swap" Payment="2011-04-19" Strike="0.0000000" Underlying="ZA" ResetStrike="No" ResetNotional="No" QuotingDate="2011-04-17" Multiplier="1.000000" PlusConstant="0.000000" StopLossPercent="" StopLossLevel=""/>
</Periods>
In the second case, the attribute change in Strike is not picked up. I have the following 2 XML files:
XML1
<Periods>
<Period Start="2011-03-28" End="2011-04-30" Payment="2011-05-02" Notional="5220000.000000" Strike="176.201900" StopLossPercent="" StopLossLevel=""/>
</Periods>
XML2
<Periods>
<Period Start="2011-03-28" End="2011-04-30" Payment="2011-05-02" Notional="5220000.000000" Strike="0.000000" StopLossPercent="" StopLossLevel=""/>
</Periods>
Does anyone know if I'm doing something wrong, or is there a bug in the XMLDiff package?
Alternatively, does anyone know a different tool that can be used in the same way, just identifying differences in nodes and attributes between XML files, regardless of the order?
Thanks,
Milena
UPDATE: As it's extremely time-consuming to get new external packages approved for use in our system, in the ideal case I'd like to find a solution to making oracle.xml.differ.XMLDiff work. Obviously if there really is a bug and this can't be bypassed I'll consider other tools.
UPDATE 2: Since nobody seems to know about the XMLDiff bug, I'll try implementing the suggested XMLUnit package, it should do the trick.
In a unit test i'm using org.custommonkey.xmlunit.Diff for comparing xml content. See http://xmlunit.sourceforge.net/api/org/custommonkey/xmlunit/Diff.html
I'm comparing xml strings but you can also compare xml w3c documents. I hope you can convert your XMLDocument to either a String of an org.w3c.dom.Document.
my testcase looks like this:
String actualXML = SomeClass.getElement().asXML();
String expectedXML = IOUtils.toString(this.getClass().getResourceAsStream("/expected.xml"));
org.custommonkey.xmlunit.Diff myDiff = new Diff(StringUtils.deleteWhitespace(expectedXML), StringUtils.deleteWhitespace(actualXML));
assertTrue(MessageFormat.format("XML must be simular: {0}\nActual XML:\n{1}\n", myDiff, actualXML), myDiff.similar());
p.s. I also use the apache commons StringUtils.deleteWhitespace() method, cause i'm not interested in white space differences.
I am trying to retrieve the value of an attribute from an xmel file using XPath and I am not sure where I am going wrong..
This is the XML File
<soapenv:Envelope>
<soapenv:Header>
<common:TestInfo testID="PI1" />
</soapenv:Header>
</soapenv:Envelope>
And this is the code I am using to get the value. Both of these return nothing..
XPathBuilder getTestID = new XPathBuilder("local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']/*[local-name(.)='TestInfo'])");
XPathBuilder getTestID2 = new XPathBuilder("Envelope/Header/TestInfo/#testID");
Object doc2 = getTestID.evaluate(context, sourceXML);
Object doc3 = getTestID2.evaluate(context, sourceXML);
How can I retrieve the value of testID?
However you're iterating within the java, your context node is probably not what you think, so remove the "." specifier in your local-name(.) like so:
/*[local-name()='Header']/*[local-name()='TestInfo']/#testID worked fine for me with your XML, although as akaIDIOT says, there isn't an <Envelope> tag to be seen.
The XML file you provided does not contain an <Envelope> element, so an expression that requires it will never match.
Post-edit edit
As can be seen from your XML snippet, the document uses a specific namespace for the elements you're trying to match. An XPath engine is namespace-aware, meaning you'll have to ask it exactly what you need. And, keep in mind that a namespace is defined by its uri, not by its abbreviation (so, /namespace:element doesn't do much unless you let the XPath engine know what the namespace namespace refers to).
Your first XPath has an extra local-name() wrapped around the whole thing:
local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']
/*[local-name(.)='TestInfo'])
The result of this XPath will either be the string value "TestInfo" if the TestInfo node is found, or a blank string if it is not.
If your XML is structured like you say it is, then this should work:
/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='TestInfo']/#testID
But preferably, you should be working with namespaces properly instead of (ab)using local-name(). I have a post here that shows how to do this in Java.
If you don't care for the namespaces and use an XPath 2.0 compatible engine, use * for it.
//*:Header/*:TestInfo/#testID
will return the desired input.
It will probably be more elegant to register the needed namespaces (not covered here, depends on your XPath engine) and query using these:
//soapenv:Header/common:TestInfo/#testID
I am using java.I have an xml file which looks like this:
<?xml version="1.0"?>
<personaldetails>
<phno>1553294232</phno>
<email>
<official>xya#gmail.com</official>
<personal>bk#yahoo.com</personal>
</email>
</personaldetails>
Now,I need to check each of the tag values for its type using specific conditions,and put them in separate files.
For example,in the above file,i write conditions like 10 digits equals phone number,
something in the format of xxx#yy.com is an email..
So,what i need to do is i need to extract the tag values in each tag and if it matches a certain condition,it is put in the first text file,if not in the second text file.
in that case,the first text file will contain:
1553294232
xya#gmail.com
bk#yahoo.com
and the rest of the values in the second file.
i just don't know how to extract the tag values without using the tag name.(or without using GetElementsByTagName).
i mean this code should extract the email bk#yahoo.com even if i give <mailing> instead of <personal> tag.It should not depend on the tag name.
Hope i am not confusing.I am new to java using xml.So,pardon me if my question is silly.
Please Help.
Seems like a typical use case for XPath
XPath allows you to query XML in a very flexible way.
This tutorial could help:
http://www.javabeat.net/2009/03/how-to-query-xml-using-xpath/
If you're using Java script, which could to be the case, since you mention getElementsByTagName(), you could just use JQuery selectors, it will give you a consistent behavior across browsers, and JQuery library is useful for a lot of other things, if you are not using it already... http://api.jquery.com/category/selectors/
Here for example is information on this:
http://www.switchonthecode.com/tutorials/xml-parsing-with-jquery
Since you don't know your element name, I would suggest creating a DOM tree and iterating through it. As and when you get a element, you would try to match it against your ruleset (and I would suggest using regex for this purpose) and then write it to your a file.
This would be a sample structure to help you get started, but you would need to modify it based on your requirement:
public void parseXML(){
try{
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc;
doc = documentBuilder.parse(new File("test.xml"));
getData(null, doc.getDocumentElement());
}catch(Exception exe){
exe.printStackTrace();
}
}
private void getData(Node parentNode, Node node){
switch(node.getNodeType()){
case Node.ELEMENT_NODE:{
if(node.hasChildNodes()){
NodeList list = node.getChildNodes();
int size = list.getLength();
for(int index = 0; index < size; index++){
getData(node, list.item(index));
}
}
break;
}
case Node.TEXT_NODE:{
String data = node.getNodeValue();
if(data.trim().length() > 0){
/*
* Here you need to check the data against your ruleset and perform your operation
*/
System.out.println(parentNode.getNodeName()+" :: "+node.getNodeValue());
}
break;
}
}
}
You might want to look at the Chain of Responsibility design pattern to design your ruleset.