Always get null when querying XML with XPath - java

I am using the following code to query some XML with XPath I get from a stream.
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(false);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputStream);
inputStream.close();
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//FOO_ELEMENT");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
I have checked the stream for content by converting it to a string - and it's all there - so it's not as if there is no data in the stream.
This is just annoying me now - as I have tried various different bits of code and I still keep getting 'null' being printed at the "System.out.println" line - what am I missing here?
NOTE: I want to see the text inside the element.

In addition to what Brabster suggested, you may want to try
System.out.println(nodes.item(i).getTextContent());
or
System.out.println(nodes.item(i).getNodeName());
depending on what you're intending to display.
See http://java.sun.com/javase/6/docs/api/org/w3c/dom/Node.html

Not an expert in the Java XPath impl tbh, but this might help.
The javadocs say that he result of getNodeValue() will be null for most types of node.
It's not totally clear what you expect to see in the output; element name, attributes, text? I'll guess text. In any XPath impl I have used, if you want the text content of the node, you have to XPath to
//FOO_ELEMENT/text()
Then the node's value is the text content of the node.
The getTextContent() method will return the text content of the node you've selected with the XPath, and any descendant nodes, as per the javadoc. The solution above selects exactly the text component of the any nodes FOO_ELEMENT in the document.
Java EE Docs for Node <-- old docs, see comments for current docs.

Related

XML with different namespaces drilling down to needed value

I am trying to figure out how to go about getting the value of jxdm:ID from the following XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<My:Message
xmlns:Abcd="http://...."
xmlns:box-1="http://...."
xmlns:bulb="http://...."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://....stores.xsd">
<Abcd:StoreDataSection>
<Abcd:DataSection>
<Abcd:FirstStore>
<box-1:Response>
<box-1:DataSection>
<box-1:Release>
<box-1:Activity>
<bulb:Date>2017-04-29</bulb:Date>
<bulb:Store xsi:type="TPIR:Organization">
<bulb:StoreID>
<bulb:ID>D79G2102</bulb:ID>
</bulb:StoreID>
</bulb:Store>
</box-1:Activity>
</box-1:Release>
</box-1:DataSection>
</box-1:Response>
</Abcd:FirstStore>
</Abcd:DataSection>
</Abcd:StoreDataSection>
</ My:Message>
I keep getting "null" as the value of node
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
This is my current Java code:
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new File("c:/temp/testingNamespace.xml"));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//My/Message//Abcd/StoreDataSection/DataSection/FirstStore//box-1/Response/DataSection/Release/Activity//bulb/Store/StoreID/ID";
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
node.setTextContent("changed ID");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(document), new StreamResult(new File("C:/temp/test-updated.xml")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
How would the correct XPath be formatted in order for me to get that value and change it?
Update 1
So something like this?
String expression = "/My:Message/Abcd:StoreDataSection/Abcd:DataSection/Abcd:FirstStore/box-1:Response/box-1:DataSection/box-1:Release/box-1:Activity/bulb:Store/bulb:StoreID/bulb:ID";
The problem is that you should access to Node by prefix (if you want to) but in a different way, like: //bulb:StoreID if you want to access StorID for example.
Then again it would still not work because you need to tell XPath how to resolve namspaces prefixes.
You should check this answer : How to query XML using namespaces in Java with XPath?
for details on how to implement and use a NamespaceContext.
The bottom line is that you need to implement a javax.xml.namespace.NamespaceContext and set it to the XPath.
XPath xpath = XPathFactory.newInstance().newXPath();
NamespaceContext context = new MyNamespaceContext();
xpath.setNamespaceContext(context);
Two things wrong here:
Your XML is not namespace-well-formed; it does not declare the used namespace prefixes.
Once namespace prefixes are properly declared in the XML and in your Java code, you use them in XPath via : not via /. So, it'd be not /Abcd/StoreDataSection but rather /Abcd:StoreDataSection (and so on for the rest of the steps in your XPath).
See also How does XPath deal with XML namespaces?
I am unable to change anything in the XML so I have to go with it as-is sadly.
Technically you might be able to use some XML tools with undeclared namespaces because this omission only renders the XML only namespace-not-well-formed. Many tools expect not only well-formed but also namespace-well-formed XML. (See Namespace-Well-Formed
for the difference)
Otherwise, see How to parse invalid (bad / not well-formed) XML? to repair your XML.

How to skip certain elements when parsing XML file to DOM with Java

I am attempting to parse some XML documents into DOM so that I can run XPath queries against it. My code is in Java and have been using the Xerces org.apache.xerces.parsers.DOMParser implementation.
I am only interested in certain portions of the XML, under element elementICareAbout and can ignore other elements.
<top>
<elementICareAbout>...</elementICareAbout>
<elementToIgnore>...</elementToIgnore>
</top>
The XML file size can be quite large, and I would not like to have to hold onto elements in memory which I would not need as part of the processing, where I would expect an XPath query to /top/elementICareAbout to return data, but /top/elementToIgnore would just return nothing (as I don't need it to).
Looking over the Xerces DOMParser or the JAXP APIs I don't see any kind of way to explicitly ignore certain elements so that they are not part of the DOM tree in memory after parsed?
Is there a good way to construct a partial DOM Document from an XML file tailored to the parts that I need?
You could write a SAX filter and insert it into the processing pipeline between the (SAX) parser and the document builder. Or with rather less coding you could write an XSLT 3.0 streaming transformation. Or you could write an XQuery to select the parts of the document you want, and run it using a query processor that supports document projection. It all depends how wedded you are to Java/DOM coding - my preference would be for higher-level languages than that.
You can also get the element by tagname.
For example, if you have a xml files call Question.xml.
Question.xml
In the java file, you can do the following:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(responseString));
Document doc = dBuilder.parse(is);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("Question");
//get all lessons stored
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
//Looking through elements by tagname
String q1 = eElement.getElementsByTagName("q1").item(0).getTextContent();

Finding Element in NodeList XML

Is there a way i can get the first Element from a NodeList? Im using org.w3c.dom to handle XML files, i have already written large parts of my program using org.w3c.dom and discovered only recently dom4j which has a method for it but i cannot use it because of backward compatibility issues with my other methods.
It is critical that i can find and pass the very first Element in my XML and it must be of type org.w3c.dom.Element , however, not even using doc.normalize(); has helped, neither did using dom4j methods to find the element and cast it into org.w3c.dom.Element as thats forbidden.
File file = new File("myXML.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dbuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
doc = dBuilder.parse(file);
doc.normalize;
Element sourceElem = doc.getDocumentElement();
NodeList nodelist = sourceElem.getChildNodes();
Element elem;
if(nodeList.item(0).getNodeType() == Node.ELEMENT_NODE){
elem = (Element)nodeList.item(0);
}
Im getting NullPointerException from other methods because it cant find the element.
I need it to work in the same way this C++ code does:
XmlDocument doc = new XmlDocument();
doc.Load("myXML.xml");
XmlElement elem = (XmlElement)doc.DocumentElement.ChildNodes[0];
EDIT: OR is there a way i can cast dom4j Element back into org.w3c.dom.Element?
EDIT2: Sample XML i need to access http://pastebin.com/C3nvxhwx

XPath Java count of child nodes

I want to count some child nodes of a given xml. But it always returns me 0 and I can't figure out why.
Here's the xml:
<FirstOne xmlns:xxx="http://www.w3.org/2001/XMLSchema-instance">
<Formulas xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
<xxx:yyy>
<aa:bb>something</aa:bb>
<cc:dd>something</cc:dd>
</xxx:yyy>
</Formulas>
</FirstOne>
I want to count the number of "xxx:yyy". In this example 3.
I tried the following:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(fileArray[i].toString())));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
expression = "count(//Formulas/xxx:yyy)";
Double result = (Double) xpath.evaluate(expression, doc, XPathConstants.NUMBER);
It always gives me 0.0 ...
Thanks for your help!
The problems all stem from the namespaces.
Firstly, XPath evaluation is only defined over namespace-well-formed XML, so you need to ensure that the aa and cc prefixes are properly mapped to namespace URIs in the XML.
Secondly, you need to parse the XML into a DOM tree using a namespace-aware parser (for what I can only assume are historical reasons, DocumentBuilderFactory is not namespace-aware by default).
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(fileArray[i].toString())));
Now you have a proper namespace-well-formed DOM tree you need to handle the namespaces correctly in the XPath. You need to define a NamespaceContext telling the XPath how to relate prefixes and namespace URIs. Annoyingly there's no default implementation of this interface available in the core Java libraries but there are third-party implementations such as Spring's SimpleNamespaceContext, or it's only three methods to implement it yourself. With a SimpleNamespaceContext:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
xpath.setNamespaceContext(nsCtx);
nsCtx.bindNamespaceUri("x", "http://www.w3.org/2001/XMLSchema-instance");
With this context in place you can now select namespaced nodes in your XPath expression:
String expression = "count(//Formulas/x:yyy)";
(the prefixes you use are the ones in the NamespaceContext, not necessarily the ones in the original XML source).
While some DOM parsers and XPath implementations might let you get away with parsing non-namespace-aware and omitting the prefixes in the XPath expressions, this is an implementation detail and the behaviour is not defined by the specifications. It might work in one version but fail in another, or behave differently if you add additional JARs to your project that change the default parser, etc.
While xxx is the tag prefix, use just count(//Formulas/yyy).

Avoid repeated instantiation of InputSource with XPath in Java

Currently I am parsing XML messages with XPath Expression. It works very well. However I have the following problem:
I am parsing the whole data of the XML, thus I instantiate for every call made to xPath.evaulate a new InputSource.
StringReader xmlReader = new StringReader(xml);
InputSource source = new InputSource(xmlReader);
XPathExpression xpe = xpath.compile("msg/element/#attribute");
String attribute = (String) xpe.evaluate(source, XPathConstants.STRING);
Now I would like to go deeper into my XML message and evaluate more information. For this I found myself in the need to instantiate source another time. Is this required? If I don't do it, I get Stream closed Exceptions.
Parse the XML to a DOM and keep a reference to the node(s). Example:
XPath xpath = XPathFactory.newInstance()
.newXPath();
InputSource xml = new InputSource(new StringReader("<xml foo='bar' />"));
Node root = (Node) xpath.evaluate("/", xml, XPathConstants.NODE);
System.out.println(xpath.evaluate("/xml/#foo", root));
This avoids parsing the string more than once.
If you must reuse the InputSource for a different XML string, you can probably use the setters with a different reader instance.

Categories

Resources