XML with different namespaces drilling down to needed value - java

I am trying to figure out how to go about getting the value of jxdm:ID from the following XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<My:Message
xmlns:Abcd="http://...."
xmlns:box-1="http://...."
xmlns:bulb="http://...."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://....stores.xsd">
<Abcd:StoreDataSection>
<Abcd:DataSection>
<Abcd:FirstStore>
<box-1:Response>
<box-1:DataSection>
<box-1:Release>
<box-1:Activity>
<bulb:Date>2017-04-29</bulb:Date>
<bulb:Store xsi:type="TPIR:Organization">
<bulb:StoreID>
<bulb:ID>D79G2102</bulb:ID>
</bulb:StoreID>
</bulb:Store>
</box-1:Activity>
</box-1:Release>
</box-1:DataSection>
</box-1:Response>
</Abcd:FirstStore>
</Abcd:DataSection>
</Abcd:StoreDataSection>
</ My:Message>
I keep getting "null" as the value of node
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
This is my current Java code:
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new File("c:/temp/testingNamespace.xml"));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//My/Message//Abcd/StoreDataSection/DataSection/FirstStore//box-1/Response/DataSection/Release/Activity//bulb/Store/StoreID/ID";
Node node = (Node) xPath.evaluate(expression, document, XPathConstants.NODE);
node.setTextContent("changed ID");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(document), new StreamResult(new File("C:/temp/test-updated.xml")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
How would the correct XPath be formatted in order for me to get that value and change it?
Update 1
So something like this?
String expression = "/My:Message/Abcd:StoreDataSection/Abcd:DataSection/Abcd:FirstStore/box-1:Response/box-1:DataSection/box-1:Release/box-1:Activity/bulb:Store/bulb:StoreID/bulb:ID";

The problem is that you should access to Node by prefix (if you want to) but in a different way, like: //bulb:StoreID if you want to access StorID for example.
Then again it would still not work because you need to tell XPath how to resolve namspaces prefixes.
You should check this answer : How to query XML using namespaces in Java with XPath?
for details on how to implement and use a NamespaceContext.
The bottom line is that you need to implement a javax.xml.namespace.NamespaceContext and set it to the XPath.
XPath xpath = XPathFactory.newInstance().newXPath();
NamespaceContext context = new MyNamespaceContext();
xpath.setNamespaceContext(context);

Two things wrong here:
Your XML is not namespace-well-formed; it does not declare the used namespace prefixes.
Once namespace prefixes are properly declared in the XML and in your Java code, you use them in XPath via : not via /. So, it'd be not /Abcd/StoreDataSection but rather /Abcd:StoreDataSection (and so on for the rest of the steps in your XPath).
See also How does XPath deal with XML namespaces?
I am unable to change anything in the XML so I have to go with it as-is sadly.
Technically you might be able to use some XML tools with undeclared namespaces because this omission only renders the XML only namespace-not-well-formed. Many tools expect not only well-formed but also namespace-well-formed XML. (See Namespace-Well-Formed
for the difference)
Otherwise, see How to parse invalid (bad / not well-formed) XML? to repair your XML.

Related

How to avoid xmlns="" to be added to a manipulated XML root element?

I'm changing the jta-data-source value of a persistence.xml as follows:
JavaArchive jarArchive = Maven.configureResolver().workOffline().resolve("richtercloud:project1-jar:jar:1.0-SNAPSHOT").withoutTransitivity().asSingle(JavaArchive.class);
Node persistenceXml = jarArchive.get("META-INF/persistence.xml");
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document persistenceXmlDocument = documentBuilder.parse(persistenceXml.getAsset().openStream());
//asked
//https://stackoverflow.com/questions/46771622/how-to-create-a-shrinkwrap-persistencedescriptor-from-an-existing-persistence-xm
//for how to manipulate persistence.xml more easily with
//ShrinkWrap's PersistenceDescriptor
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("//persistence-unit/jta-data-source");
org.w3c.dom.Node persistenceXmlDataSourceNode = (org.w3c.dom.Node) expr.evaluate(persistenceXmlDocument,
XPathConstants.NODE);
persistenceXmlDataSourceNode.setTextContent("jdbc/project1-test-db");
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
//transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
//was there before, but unclear why
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(persistenceXmlDocument), new StreamResult(writer));
String persistenceUnit = writer.toString();
(since How to create a ShrinkWrap PersistenceDescriptor from an existing persistence.xml? has not been answered, yet).
That works fine, except for a xmlns="" attribute added to the persistence-unit under the root persistence element which seems to cause:
java.io.IOException: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 108; Deployment descriptor file META-INF/persistence.xml in archive [project1-jar-1.0-SNAPSHOT.jar]. cvc-complex-type.2.4.a: Invalid content was found starting with element 'persistence-unit'. One of '{"http://xmlns.jcp.org/xml/ns/persistence":persistence-unit}' is expected.
I'm not adhering to the idea to use Transformer and related classes.
No idea why I can't reproduce this in Java SE, but the problem is that javax.xml.parsers.DocumentBuilder by default isn't namespace aware so that the namespace information gets lost during manipulation of the document and consequently an empty xmlns is added by Transformer.
Now that DocumentBuilder is namespace aware, XPath resolution doesn't work and queries return null, see XPath returning null for "Node" when isNameSpaceAware and isValidating are "true" for a detailed description and more details on the XPath namespace awareness issue (I couldn't get the solution to build a custom NamespaceContext to work).
In order to avoid this I finally adjusted my XPath query to use the local-name function as described at How to ignore namespace when selecting XML nodes with XPath. i.e. //*[local-name()='jta-data-source'].
It's still necessary to use Document.createElementNS instead of createElement in order to avoid empty xmlns attribute on newly created elements, see Empty default XML namespace xmlns="" attribute being added? for an explanation.

Creating namespace prefixed XML nodes in Java DOM

I am creating several XML files via Java and up to this point everything worked fine, but now I've run into a problem when trying to create a file with namespace prefixed nodes, i.e, stuff like <tns:node> ... </tns:node> using a refactored version of my code that's already working for normal xml files without namespaces.
The error getting thrown is:
org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: Ungültiges XML-Zeichen angegeben.
Sorry for the German in there, it says "invalid XML-sign specified".
The codeline where the error occurs:
Element mainRootElement = doc.createElement("tns:cmds xmlns:tns=\"http://abc.de/x/y/z\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://abc.de/x/y/z xyzschema.xsd\"");
To eliminate the possibility of the error resulting in escaping that rather long string or something among those lines I also tried just using Element mainRootElement = doc.createElement("tns:cmds");, however, this results in the same error.
That's why I figure it has something to do with the namespace declaration, i.e., the : used to do it, as that's the only "invalid" character I could think of in that string.
Can anyone confirm this is the source of the problem? If so, is there an easy solution to it? Can Java DOM use namespaced tags at all?
Edit: Whole method for reference
private void generateScriptXML()
{
DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder icBuilder;
try
{
icBuilder = icFactory.newDocumentBuilder();
Document doc = icBuilder.newDocument();
Element mainRootElement = doc.createElement("tns:cmds xmlns:tns=\"http://abc.de/x/y/z\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://abc.de/x/y/z xyzschema.xsd\"");
doc.appendChild(mainRootElement);
mainRootElement.appendChild(getAttributes(doc,"xxx", "yyy", "zzz"));
mainRootElement.appendChild(getAttributes(doc,"aaa", "bbb", "ccc"));
mainRootElement.appendChild(getAttributes(doc,"ddd", "eee", "fff"));
...
...
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult streamResult = new StreamResult(new File(vfsPath));
transformer.transform(source, streamResult);
}
catch (Exception e)
{
e.printStackTrace();
}
}
Wrong method, try the *NS variants:
Element mainRootElement = doc.createElementNS(
"http://abc.de/x/y/z", // namespace
"tns:cmds" // node name including prefix
);
First argument is the namespace, second the node name including the prefix/alias. Namespace definitions will be added automatically for the namespace if needed. It works to set them as attributes, too.
The namespace in your original source is http://abc.de/x/y/z. With the attribute xmlns:tns="http://abc.de/x/y/z" the alias/prefix tns is defined for the namespace. The DOM api will implicitly add namespaces for nodes created with the *NS methods.
xmlns and xml are reserved/default namespace prefixes for specific namespaces. The namespace for xmlns (namespace definitions) is http://www.w3.org/2000/xmlns/.
To add an xmlns:* attribute with setAttributeNS() use the xmlns namespace:
mainRootElement.setAttributeNS(
"http://www.w3.org/2000/xmlns/", // namespace
"xmlns:xsi", // node name including prefix
"http://www.w3.org/2001/XMLSchema-instance" // value
);
But even that is not needed. Just like for elements, the namespace definition will be added implicitly if you add an attribute node using it.
mainRootElement.setAttributeNS(
"http://www.w3.org/2001/XMLSchema-instance", // namespace
"xsi:schemaLocation", // node name including prefix
"http://abc.de/x/y/z xyzschema.xsd" // value
);
Namespaces Prefixes
If you see a nodename like xsi:schemaLocation you can resolve by looking for the xmlns:xsi attribute. This attribute is the namepace definition. The value is the actual namespace. So if you have an attribute xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" the node name can be resolved to {http://www.w3.org/2001/XMLSchema-instance}schemaLocation (Clark notation).
If you want to create the node you need 3 values:
the namespace: http://www.w3.org/2001/XMLSchema-instance
the local node name: schemaLocation
the prefix: xsi
The prefix is optional for element nodes, but mandatory for attribute nodes. The following three XMLs resolve all to the element node name {http://abc.de/x/y/z}cmds:
<tns:cmds xmlns:tns="http://abc.de/x/y/z"/>
<cmds xmlns="http://abc.de/x/y/z"/>
<other:cmds xmlns:other="http://abc.de/x/y/z"/>

java DOM lookupNamespaceURI is not able to locate namespace URI

I'm trying to follow http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html
UniversalNamespaceResolver
example for resolving namespaces of the XPath evaluation agains an XML. The problem I encountered is that lookupNamespaceURI call below returns null on the XML, I given below:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse(new InputSource(new StringReader(xml)));
String nsURI = dDoc.lookupNamespaceURI("h");
the XML:
<?xml version="1.0"?>
<h:root xmlns:h="http://www.w3.org/TR/html4/">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>`
</h:root>
while I'd expect it to return "http://www.w3.org/TR/html4/".
When configuring a DocumentBuilder, you have to explicitly make it namespace aware (a silly relic from the first days of xml when there were no namespaces):
domFactory.setNamespaceAware(true);
As a side note, the advice in that article is not very good. it fundamentally misses the point that you don't care what the namespace prefixes are in the actual document, they are irrelevant. you need the xpath namespace resolver to match the xpath expressions that you are using, and that is all. if you do what they are suggesting, you will have to change your xpath code whenever the document's prefixes change, which is a horrible idea.
Note, they sort of cede this point in their last bullet, but the rest of the article seems to miss that this is the fundamental idea when using xpath.
But if you don't have control over the XML file, and someone can send you any prefixes they wish, it might be better to be independent of their choices. You can code your own namespace resolution as in Example 1 (HardcodedNamespaceResolver), and use them in your XPath expressions.

Avoid repeated instantiation of InputSource with XPath in Java

Currently I am parsing XML messages with XPath Expression. It works very well. However I have the following problem:
I am parsing the whole data of the XML, thus I instantiate for every call made to xPath.evaulate a new InputSource.
StringReader xmlReader = new StringReader(xml);
InputSource source = new InputSource(xmlReader);
XPathExpression xpe = xpath.compile("msg/element/#attribute");
String attribute = (String) xpe.evaluate(source, XPathConstants.STRING);
Now I would like to go deeper into my XML message and evaluate more information. For this I found myself in the need to instantiate source another time. Is this required? If I don't do it, I get Stream closed Exceptions.
Parse the XML to a DOM and keep a reference to the node(s). Example:
XPath xpath = XPathFactory.newInstance()
.newXPath();
InputSource xml = new InputSource(new StringReader("<xml foo='bar' />"));
Node root = (Node) xpath.evaluate("/", xml, XPathConstants.NODE);
System.out.println(xpath.evaluate("/xml/#foo", root));
This avoids parsing the string more than once.
If you must reuse the InputSource for a different XML string, you can probably use the setters with a different reader instance.

Always get null when querying XML with XPath

I am using the following code to query some XML with XPath I get from a stream.
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(false);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputStream);
inputStream.close();
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//FOO_ELEMENT");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
I have checked the stream for content by converting it to a string - and it's all there - so it's not as if there is no data in the stream.
This is just annoying me now - as I have tried various different bits of code and I still keep getting 'null' being printed at the "System.out.println" line - what am I missing here?
NOTE: I want to see the text inside the element.
In addition to what Brabster suggested, you may want to try
System.out.println(nodes.item(i).getTextContent());
or
System.out.println(nodes.item(i).getNodeName());
depending on what you're intending to display.
See http://java.sun.com/javase/6/docs/api/org/w3c/dom/Node.html
Not an expert in the Java XPath impl tbh, but this might help.
The javadocs say that he result of getNodeValue() will be null for most types of node.
It's not totally clear what you expect to see in the output; element name, attributes, text? I'll guess text. In any XPath impl I have used, if you want the text content of the node, you have to XPath to
//FOO_ELEMENT/text()
Then the node's value is the text content of the node.
The getTextContent() method will return the text content of the node you've selected with the XPath, and any descendant nodes, as per the javadoc. The solution above selects exactly the text component of the any nodes FOO_ELEMENT in the document.
Java EE Docs for Node <-- old docs, see comments for current docs.

Categories

Resources