Default XML namespace, JDOM, and XPath - java

I want to use JDOM to read in an XML file, then use XPath to extract data from the JDOM Document. It creates the Document object fine, but when I use XPath to query the Document for a List of elements, I get nothing.
My XML document has a default namespace defined in the root element. The funny thing is, when I remove the default namespace, it successfully runs the XPath query and returns the elements I want. What else must I do to get my XPath query to return results?
XML:
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.foo.com">
<dvd id="A">
<title>Lord of the Rings: The Fellowship of the Ring</title>
<length>178</length>
<actor>Ian Holm</actor>
<actor>Elijah Wood</actor>
<actor>Ian McKellen</actor>
</dvd>
<dvd id="B">
<title>The Matrix</title>
<length>136</length>
<actor>Keanu Reeves</actor>
<actor>Laurence Fishburne</actor>
</dvd>
</collection>
Java:
public static void main(String args[]) throws Exception {
SAXBuilder builder = new SAXBuilder();
Document d = builder.build("xpath.xml");
XPath xpath = XPath.newInstance("collection/dvd");
xpath.addNamespace(d.getRootElement().getNamespace());
System.out.println(xpath.selectNodes(d));
}

XPath 1.0 doesn't support the concept of a default namespace (XPath 2.0 does).
Any unprefixed tag is always assumed to be part of the no-name namespace.
When using XPath 1.0 you need something like this:
public static void main(String args[]) throws Exception {
SAXBuilder builder = new SAXBuilder();
Document d = builder.build("xpath.xml");
XPath xpath = XPath.newInstance("x:collection/x:dvd");
xpath.addNamespace("x", d.getRootElement().getNamespaceURI());
System.out.println(xpath.selectNodes(d));
}

I had a similiar problem, but mine was that I had a mixture of XML inputs, some of which had a namespace defined and others that didn't. To simplify my problem I ran the following JDOM snippet after loading the document.
for (Element el : doc.getRootElement().getDescendants(new ElementFilter())) {
if (el.getNamespace() != null) el.setNamespace(null);
}
After removing all the namespaces I was able to use simple getChild("elname") style navigation or simple XPath queries.
I wouldn't recommend this technique as a general solution, but in my case it was definitely useful.

You can also do the following
/*[local-name() = 'collection']/*[local-name() = 'dvd']/
Here is list of useful xpath queries.

Related

Determining Java XPATH for multiple Nested Namespaces [duplicate]

How does XPath deal with XML namespaces?
If I use
/IntuitResponse/QueryResponse/Bill/Id
to parse the XML document below I get 0 nodes back.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3"
time="2016-10-14T10:48:39.109-07:00">
<QueryResponse startPosition="1" maxResults="79" totalCount="79">
<Bill domain="QBO" sparse="false">
<Id>=1</Id>
</Bill>
</QueryResponse>
</IntuitResponse>
However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.
XPath 1.0/2.0
Defining namespaces in XPath (recommended)
XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.
It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.
Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.
(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)
C#:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(#"/i:IntuitResponse/i:QueryResponse", nsmgr);
Google Docs:
Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.
Java (SAX):
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
Java (XPath):
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
Remember to call
DocumentBuilderFactory.setNamespaceAware(true).
See also:
Java XPath: Queries with default namespace xmlns
JavaScript:
See Implementing a User Defined Namespace Resolver:
function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );
Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().
Perl (LibXML):
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my #nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
Python (lxml):
from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})
Python (ElementTree):
namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
Python (Scrapy):
response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
PhP:
Adapted from #Tomalak's answer using DOMDocument:
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
See also #IMSoP's canonical Q/A on PHP SimpleXML namespaces.
Ruby (Nokogiri):
puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")
Note that Nokogiri supports removal of namespaces,
doc.remove_namespaces!
but see the below warnings discouraging the defeating of XML namespaces.
VBA:
xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
VB.NET:
xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)
SoapUI (doc):
declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse
xmlstarlet:
-N i="http://schema.intuit.com/finance/v3"
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...
Once you've declared a namespace prefix, your XPath can be written to use it:
/i:IntuitResponse/i:QueryResponse
Defeating namespaces in XPath (not recommended)
An alternative is to write predicates that test against local-name():
/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']
Or, in XPath 2.0:
/*:IntuitResponse/*:QueryResponse
Skirting namespaces in this manner works but is not recommended because it
Under-specifies the full element/attribute name.
Fails to differentiate between element/attribute names in different
namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='IntuitResponse']
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='QueryResponse']
Thanks to Daniel Haley for the namespace-uri() note.
Is excessively verbose.
XPath 3.0/3.1
Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:
/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse
While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.
Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.
I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this
thes WD prop links items
NOM P7749 3925 3789
AAT P1014 21157 20224
and the formulas in cols links and items are
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
respectively. The SPARQL query happens not to have any spaces...
I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.

XPath application on XML returns empty string [duplicate]

How does XPath deal with XML namespaces?
If I use
/IntuitResponse/QueryResponse/Bill/Id
to parse the XML document below I get 0 nodes back.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3"
time="2016-10-14T10:48:39.109-07:00">
<QueryResponse startPosition="1" maxResults="79" totalCount="79">
<Bill domain="QBO" sparse="false">
<Id>=1</Id>
</Bill>
</QueryResponse>
</IntuitResponse>
However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.
XPath 1.0/2.0
Defining namespaces in XPath (recommended)
XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.
It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.
Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.
(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)
C#:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(#"/i:IntuitResponse/i:QueryResponse", nsmgr);
Google Docs:
Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.
Java (SAX):
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
Java (XPath):
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
Remember to call
DocumentBuilderFactory.setNamespaceAware(true).
See also:
Java XPath: Queries with default namespace xmlns
JavaScript:
See Implementing a User Defined Namespace Resolver:
function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );
Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().
Perl (LibXML):
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my #nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
Python (lxml):
from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})
Python (ElementTree):
namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
Python (Scrapy):
response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
PhP:
Adapted from #Tomalak's answer using DOMDocument:
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
See also #IMSoP's canonical Q/A on PHP SimpleXML namespaces.
Ruby (Nokogiri):
puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")
Note that Nokogiri supports removal of namespaces,
doc.remove_namespaces!
but see the below warnings discouraging the defeating of XML namespaces.
VBA:
xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
VB.NET:
xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)
SoapUI (doc):
declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse
xmlstarlet:
-N i="http://schema.intuit.com/finance/v3"
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...
Once you've declared a namespace prefix, your XPath can be written to use it:
/i:IntuitResponse/i:QueryResponse
Defeating namespaces in XPath (not recommended)
An alternative is to write predicates that test against local-name():
/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']
Or, in XPath 2.0:
/*:IntuitResponse/*:QueryResponse
Skirting namespaces in this manner works but is not recommended because it
Under-specifies the full element/attribute name.
Fails to differentiate between element/attribute names in different
namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='IntuitResponse']
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='QueryResponse']
Thanks to Daniel Haley for the namespace-uri() note.
Is excessively verbose.
XPath 3.0/3.1
Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:
/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse
While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.
Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.
I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this
thes WD prop links items
NOM P7749 3925 3789
AAT P1014 21157 20224
and the formulas in cols links and items are
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
respectively. The SPARQL query happens not to have any spaces...
I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.

How to locate element using xpath -- java [duplicate]

How does XPath deal with XML namespaces?
If I use
/IntuitResponse/QueryResponse/Bill/Id
to parse the XML document below I get 0 nodes back.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3"
time="2016-10-14T10:48:39.109-07:00">
<QueryResponse startPosition="1" maxResults="79" totalCount="79">
<Bill domain="QBO" sparse="false">
<Id>=1</Id>
</Bill>
</QueryResponse>
</IntuitResponse>
However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.
XPath 1.0/2.0
Defining namespaces in XPath (recommended)
XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.
It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.
Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.
(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)
C#:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(#"/i:IntuitResponse/i:QueryResponse", nsmgr);
Google Docs:
Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.
Java (SAX):
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
Java (XPath):
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
Remember to call
DocumentBuilderFactory.setNamespaceAware(true).
See also:
Java XPath: Queries with default namespace xmlns
JavaScript:
See Implementing a User Defined Namespace Resolver:
function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );
Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().
Perl (LibXML):
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my #nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
Python (lxml):
from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})
Python (ElementTree):
namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
Python (Scrapy):
response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
PhP:
Adapted from #Tomalak's answer using DOMDocument:
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
See also #IMSoP's canonical Q/A on PHP SimpleXML namespaces.
Ruby (Nokogiri):
puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")
Note that Nokogiri supports removal of namespaces,
doc.remove_namespaces!
but see the below warnings discouraging the defeating of XML namespaces.
VBA:
xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
VB.NET:
xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)
SoapUI (doc):
declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse
xmlstarlet:
-N i="http://schema.intuit.com/finance/v3"
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...
Once you've declared a namespace prefix, your XPath can be written to use it:
/i:IntuitResponse/i:QueryResponse
Defeating namespaces in XPath (not recommended)
An alternative is to write predicates that test against local-name():
/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']
Or, in XPath 2.0:
/*:IntuitResponse/*:QueryResponse
Skirting namespaces in this manner works but is not recommended because it
Under-specifies the full element/attribute name.
Fails to differentiate between element/attribute names in different
namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='IntuitResponse']
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='QueryResponse']
Thanks to Daniel Haley for the namespace-uri() note.
Is excessively verbose.
XPath 3.0/3.1
Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:
/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse
While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.
Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.
I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this
thes WD prop links items
NOM P7749 3925 3789
AAT P1014 21157 20224
and the formulas in cols links and items are
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
respectively. The SPARQL query happens not to have any spaces...
I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.

Need help with xPath expressions with namespaces

I need some help with a xpath expression.
I have the following xml document:
<?xml version="1.0" encoding="utf-8"?>
<Response xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://schemas.test.com/search/local/ws/rest/v1">
<nodeA>text</nodeA>
<nodeB>
<nodeb1>
<nodeb1a>
<nodeTest>hello</nodeTest>
</nodeb1a>
</nodeb1>
</nodeB>
</Response>
I am using Springs rest template to retrieve the data.
Here is where I am using a xpath expression to get the node values:
xpathTemplate.evaluate("base:*", xmlSource, new NodeMapper() {
public Object mapNode(Node node, int i) throws DOMException {
Element response = (Element) node;
System.out.println("hi: " + response.getTextContent());
I added base to my namespace map like this:
HashMap<String, String> nameSpaces = new HashMap<String, String>();
nameSpaces.put("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nameSpaces.put("xsd", "http://www.w3.org/2001/XMLSchema");
nameSpaces.put("base", "http://schemas.test.com/search/local/ws/rest/v1");
((Jaxp13XPathTemplate) xpathTemplate).setNamespaces(nameSpaces);
Using the expression base:* gives me the Response node and I am able to drill down to the child nodes using .getChildNodes and creating nodelists for each child.
I would like to get the hello value directly with the xpath expression but I am not able to get it to work with anything but base:*. Can someone help me with my expression?
Thanks
This XPath expression:
/base:Response/base:nodeB/base:nodeb1/base:nodeb1a/base:nodeTest
Assuming base is bound to http://schemas.test.com/search/local/ws/rest/v1 namespace URI.
hello = xpathTemplate.evaluateAsString("//*[local-name()='nodeTest']", source);
Please try the following:
xpathTemplate.evaluate("/Response/nodeB/nodeb1/nodeb1a/nodeTest", ...
If you need more information on XPath, you can try the following
http://www.zvon.org/xxl/XPathTutorial/General/examples.html
My website also proposes a tool to test your xpath queries:
http://ap2cu.com/tools/xpath/index.php5

XPath: Is there a way to set a default namespace for queries?

Is there a way to set Java's XPath to have a default namespace prefix for expressons? For example, instead of: /html:html/html:head/html:title/text()", the query could be: /html/head/title/text()
While using the namespace prefix works, there has to be a more elegant way.
Sample code snippet of what I'm doing now:
Node node = ... // DOM of a HTML document
XPath xpath = XPathFactory.newInstance().newXPath();
// set to a NamespaceContext that simply returns the prefix "html"
// and namespace URI ""http://www.w3.org/1999/xhtml"
xpath.setNamespaceContext(new HTMLNameSpace());
String expression = "/html:html/html:head/html:title/text()";
String value = xpath.evaluate(query, expression);
Unfortunately, no. There was some talk about defining a default namespace for JxPath a few years ago, but a quick look at the latest docs don't indicate that anything happened. You might want to spends some more time looking through the docs, though.
One thing that you could do, if you really don't care about namespaces, is to parse the document without them. Simply omit the call that you're currently making to DocumentBuilderFactory.setNamespaceAware().
Also, note that your prefix can be anything you want; it doesn't have to match the prefix in the instance document. So you could use h rather than html, and minimize the visual clutter of the prefix.
I haven't actually tried this, but according to the NamespaceContext documentation, the namespace context with the prefix "" (emtpy string) is considered to be the default namespace.
I was a little bit too quick on that one. The XPath evaluator does not invoke the NamespaceContext to resolve the "" prefix, if no prefix is used at all in the XPath expression "/html/head/title/text()". I'm now going into XML details, which I am not 100% sure about, but using an expression like "/:html/:head/:title/text()" works with Sun JDK 1.6.0_16 and the NamespaceContext is asked to resolve an empty prefix (""). Is this really correct and expected behaviour or a bug in Xalan?
I know this question is old but I just spent 3 hours researching trying to solve this problem and #kdgregorys answer helped me out alot. I just wanted to put exactly what I did using kdgregorys answer as a guide.
The problem is that XPath in java doesnt even look for a namespace if you dont have a prefix on your query therefore to map a query to a specific namespace you have to add a prefix to the query. I used an arbitrary prefix to map to the schema name. For this example I will use OP's namespace and query and the prefix abc. Your new expression would look like this:
String expression = "/abc:html/abc:head/abc:title/text()";
Then do the following
1) Make sure your document is set to namespace aware.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
2) Implement a NamespaceContext that will resolve your prefix. This one I took from some other post on SO and modified a bit
.
public class NamespaceResolver implements NamespaceContext {
private final Document document;
public NamespaceResolver(Document document) {
this.document = document;
}
public String getNamespaceURI(String prefix) {
if(prefix.equals("abc")) {
// here is where you set your namespace
return "http://www.w3.org/1999/xhtml";
} else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
return document.lookupNamespaceURI(null);
} else {
return document.lookupNamespaceURI(prefix);
}
}
public String getPrefix(String namespaceURI) {
return document.lookupPrefix(namespaceURI);
}
#SuppressWarnings("rawtypes")
public Iterator getPrefixes(String namespaceURI) {
// not implemented
return null;
}
}
3) When creating your XPath object set your NamespaceContext.
xPath.setNamespaceContext(new NamespaceResolver(document));
Now no matter what the actual schema prefix is you can use your own prefix that will map to the proper schema. So your full code using the class above would look something like this.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document document = factory.newDocumentBuilder().parse(sourceDocFile);
XPathFactory xPFactory = XPathFactory.newInstance();
XPath xPath = xPFactory.newXPath();
xPath.setNamespaceContext(new NamespaceResolver(document));
String expression = "/abc:html/abc:head/abc:title/text()";
String value = xpath.evaluate(query, expression);

Categories

Resources