Java/DOM: Get the XML content of a node - java

I am parsing a XML file in Java using the W3C DOM.
I am stuck at a specific problem, I can't figure out how to get the whole inner XML of a node.
The node looks like that:
<td><b>this</b> is a <b>test</b></td>
What function do I have to use to get that:
"<b>this</b> is a <b>test</b>"

I know this was asked long ago but for the next person searching (was me today), this works with JDOM:
JDOMXPath xpath = new JDOMXPath("/td");
String innerXml = (new XMLOutputter()).outputString(xpath.selectNodes(document));
This passes a list of all child nodes into outputString, which will serialize them out in order.

You have to use the transform/xslt API using your <b> node as the node to be transformed and put the result into a new StreamResult(new StringWriter());
. See how-to-pretty-print-xml-from-java

What do you say about this ?
I had same problem today on android, but i managed to make simple "serializator"
private String innerXml(Node node){
String s = "";
NodeList childs = node.getChildNodes();
for( int i = 0;i<childs.getLength();i++ ){
s+= serializeNode(childs.item(i));
}
return s;
}
private String serializeNode(Node node){
String s = "";
if( node.getNodeName().equals("#text") ) return node.getTextContent();
s+= "<" + node.getNodeName()+" ";
NamedNodeMap attributes = node.getAttributes();
if( attributes!= null ){
for( int i = 0;i<attributes.getLength();i++ ){
s+=attributes.item(i).getNodeName()+"=\""+attributes.item(i).getNodeValue()+"\"";
}
}
NodeList childs = node.getChildNodes();
if( childs == null || childs.getLength() == 0 ){
s+= "/>";
return s;
}
s+=">";
for( int i = 0;i<childs.getLength();i++ )
s+=serializeNode(childs.item(i));
s+= "</"+node.getNodeName()+">";
return s;
}

er... you could also call toString() and just chop off the beginning and end tags, either manually or using regexps.
edit: toString() doesn't do what I expected. Pulling out the O'Reilly Java & XML book talks about the Load and Save module of Java DOM.
See in particular the LSSerializer which looks very promising. You could either call writeToString(node) and chop off the beginning and end tags, as I suggested, or try to use LSSerializerFilter to not print the top node tags (not sure if that would work; I admit I've never used LSSerializer before.)
Reading the O'Reilly book seems to indicate doing something like this:
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS lsImpl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
LSSerializer serializer = lsImpl.createLSSerializer();
String nodeString = serializer.writeToString(node);

node.getTextContent();
You ought to be using JDom of Dom4J to handle nodes, if for no other reasons, to handle whitespace correctly.

To remove unneccesary tags probably such code can be used:
DOMConfiguration config = serializer.getDomConfig();
config.setParameter("canonical-form", true);
But it will not always work, because "canonical-form=true" is optional

Related

Attempting to use Java & Xpath to updated the value of any selected node inside an XML doc

I have developed GUI tool the displays an XML document as an editable JTree, and the user can select a node in the JTree and attempt to change the actual nodes value in the XML document.
The problem that I'm having is with constructing the correct Xpath query that attempts the actual update.
Here is GUI of the JTree showing which element was selected & should be edited:
Its a very large XMl, so here the collapsed snippet of the XML:
UPDATE (IGNORE ATTEMPT 1 & 2, 1ST ISSUE WAS RESOLVED, GO TO ATTEMPTS 3 & 4)
Attempt 1 # (relevant Java method that attempts to create XPath query to update a nodes value):
public void updateXmlData(JTree jTree, org.w3c.dom.Document doc, TreeNode parentNode, String oldValue, String newValue) throws XPathExpressionException {
System.out.println("Selected path=" + jTree.getSelectionPath().toString());
String[] pathTockens = jTree.getSelectionPath().toString().split(",");
StringBuilder sb = new StringBuilder();
//for loop to construct xpath query
for (int i = 0; i < pathTockens.length - 1; i++) {
if (i == 0) {
sb.append("//");
} else {
sb.append(pathTockens[i].trim());
sb.append("/");
}
}//end for loop
sb.append("text()");
System.out.println("Constructed XPath Query:" + sb.toString());
//new xpath
XPath xpath = XPathFactory.newInstance().newXPath();
//compile query
NodeList nodes = (NodeList) xpath.compile(sb.toString()).evaluate(doc, XPathConstants.NODESET);
//Make the change on the selected nodes
for (int idx = 0; idx < nodes.getLength(); idx++) {
Node value = nodes.item(idx).getAttributes().getNamedItem("value");
String val = value.getNodeValue();
value.setNodeValue(val.replaceAll(oldValue, newValue));
}
//set the new updated xml doc
SingleTask.currentTask.setDoc(doc);
}
Console logs:
Selected path=[C:\Users\xyz\Documents\XsdToXmlFiles\sampleIngest.xml, Ingest, Property_Maps, identifier, identifieXYZ]
Constructed XPath Query://Ingest/Property_Maps/identifier/text()
Jan 26, 2021 2:04:16 PM com.xyz.XmlToXsdValidator.Views.EditXmlTreeNodeDialogJFrame jButtonOkEditActionPerformed
SEVERE: null
javax.xml.transform.TransformerException: Unable to evaluate expression using this context
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:368)
As you can see in the logs:
Selected path=[C:\Users\xyz\Documents\XsdToXmlFiles\sampleIngest.xml, Ingest, Property_Maps, identifier, identifieXYZ]
Constructed XPath Query://Ingest/Property_Maps/identifier/text()
The paths are correct, basically Ingest->Property_Maps->identifier->text()
But Im getting:
javax.xml.transform.TransformerException: Unable to evaluate expression using this context
Attempt 2 # (relevant Java method that attempts to create XPath query to update a nodes value):
public void updateXmlData(JTree jTree, org.w3c.dom.Document doc, TreeNode parentNode, String oldValue, String newValue) throws XPathExpressionException {
// Locate the node(s) with xpath
System.out.println("Selected path=" + jTree.getSelectionPath().toString());
String[] pathTockens = jTree.getSelectionPath().toString().split(",");
StringBuilder sb = new StringBuilder();
//loop to construct xpath query
for (int i = 0; i < pathTockens.length - 1; i++) {
if (i == 0) {
sb.append("//");
} else {
sb.append(pathTockens[i].trim());
sb.append("/");
}
}//end loop
sb.append("[text()=");
sb.append("'");
sb.append(oldValue);
sb.append("']");
int lastIndexOfPathChar = sb.lastIndexOf("/");
sb.replace(lastIndexOfPathChar, lastIndexOfPathChar + 1, "");
System.out.println("Constructed XPath Query:" + sb.toString());
//new xpath instance
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(sb.toString(), doc, XPathConstants.NODESET);
//Make the change on the selected nodes
for (int idx = 0; idx < nodes.getLength(); idx++) {
Node value = nodes.item(idx).getAttributes().getNamedItem("value");
String val = value.getNodeValue();
value.setNodeValue(val.replaceAll(oldValue, newValue));
}
SingleTask.currentTask.setDoc(doc);
}
I was able to resolve the exception based Andreas comment, and there are no more exceptions/errors, however the XPath query does not find selected nodes. Returns empty
New updated code:
Attempt # 3 Using custom namespace resolver. References: https://www.kdgregory.com/index.php?page=xml.xpath
public boolean updateXmlData(JTree jTree, org.w3c.dom.Document doc, TreeNode parentNode, String oldValue, String newValue) throws XPathExpressionException {
System.out.println("Selected path=" + jTree.getSelectionPath().toString());
boolean changed = false;
// Locate the node(s) with xpath
String[] pathTockens = jTree.getSelectionPath().toString().split(",");
StringBuilder sb = new StringBuilder();
//loop to construct xpath query
for (int i = 0; i < pathTockens.length - 1; i++) {
if (i == 0) {
//do nothing
} else if (i == 1) {
sb.append("/ns:" + pathTockens[i].trim());
} else if (i > 1 && i != pathTockens.length - 1) {
sb.append("/ns:" + pathTockens[i].trim());
} else {
//sb.append("/" + pathTockens[i].trim());
}
}//end loop
sb.append("[text()=");
sb.append("'");
sb.append(oldValue);
sb.append("']");
System.out.println("Constructed XPath Query:" + sb.toString());
//new xpath instance
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
xpath.setNamespaceContext(new UniversalNamespaceResolver(SingleTask.currentTask.getXsdFile().getXsdNameSpace()));
NodeList nodes = (NodeList) xpath.evaluate(sb.toString(), doc, XPathConstants.NODESET);
//start for
Node node;
String val = null;
for (int idx = 0; idx < nodes.getLength(); idx++) {
if (nodes.item(idx).getAttributes() != null) {
node = nodes.item(idx).getAttributes().getNamedItem("value");
if (node != null) {
val = node.getNodeValue();
node.setNodeValue(val.replaceAll(oldValue, newValue));
changed = true;
break;
}//end if node is found
}
}//end for
//set the new updated xml doc
SingleTask.currentTask.setDoc(doc);
return changed;
}
Class that implements custom namespace resolver:
import java.util.Arrays;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;
import org.w3c.dom.Document;
/**
*
* References:https://www.kdgregory.com/index.php?page=xml.xpath
*/
//custom NamespaceContext clss implementation
public class UniversalNamespaceResolver implements NamespaceContext
{
private String _prefix = "ns";
private String _namespaceUri=null;
private List<String> _prefixes = Arrays.asList(_prefix);
public UniversalNamespaceResolver(String namespaceResolver)
{
_namespaceUri = namespaceResolver;
}
#Override
#SuppressWarnings("rawtypes")
public Iterator getPrefixes(String uri)
{
if (uri == null)
throw new IllegalArgumentException("UniversalNamespaceResolver getPrefixes() URI may not be null");
else if (_namespaceUri.equals(uri))
return _prefixes.iterator();
else if (XMLConstants.XML_NS_URI.equals(uri))
return Arrays.asList(XMLConstants.XML_NS_PREFIX).iterator();
else if (XMLConstants.XMLNS_ATTRIBUTE_NS_URI.equals(uri))
return Arrays.asList(XMLConstants.XMLNS_ATTRIBUTE).iterator();
else
return Collections.emptyList().iterator();
}
#Override
public String getPrefix(String uri)
{
if (uri == null)
throw new IllegalArgumentException("nsURI may not be null");
else if (_namespaceUri.equals(uri))
return _prefix;
else if (XMLConstants.XML_NS_URI.equals(uri))
return XMLConstants.XML_NS_PREFIX;
else if (XMLConstants.XMLNS_ATTRIBUTE_NS_URI.equals(uri))
return XMLConstants.XMLNS_ATTRIBUTE;
else
return null;
}
#Override
public String getNamespaceURI(String prefix)
{
if (prefix == null)
throw new IllegalArgumentException("prefix may not be null");
else if (_prefix.equals(prefix))
return _namespaceUri;
else if (XMLConstants.XML_NS_PREFIX.equals(prefix))
return XMLConstants.XML_NS_URI;
else if (XMLConstants.XMLNS_ATTRIBUTE.equals(prefix))
return XMLConstants.XMLNS_ATTRIBUTE_NS_URI;
else
return null;
}
}
Console Output:
Selected path=[C:\Users\xyz\DocumentsIngest_LDD.xml, Ingest_LDD, Property_Maps, identifier, identifier1]
Constructed XPath: Query:/ns:Ingest_LDD/ns:Property_Maps/ns:identifier[text()='identifier1']
Attempt #4 (Without custom namespace resolver):
public boolean updateXmlData(JTree jTree, org.w3c.dom.Document doc, TreeNode parentNode, String oldValue, String newValue) throws XPathExpressionException {
System.out.println("Selected path=" + jTree.getSelectionPath().toString());
boolean changed = false;
// Locate the node(s) with xpath
String[] pathTockens = jTree.getSelectionPath().toString().split(",");
StringBuilder sb = new StringBuilder();
//loop to construct xpath query
for (int i = 0; i < pathTockens.length - 1; i++) {
if (i == 0) {
//do nothing
} else if (i == 1) {
sb.append("/" + pathTockens[i].trim());
} else if (i > 1 && i != pathTockens.length - 1) {
sb.append("/" + pathTockens[i].trim());
} else {
//sb.append("/" + pathTockens[i].trim());
}
}//end loop
sb.append("[text()=");
sb.append("'");
sb.append(oldValue);
sb.append("']");
System.out.println("Constructed XPath Query:" + sb.toString());
//new xpath instance
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
//WITHOUT CUSTOM NAMESPACE CONTEXT xpath.setNamespaceContext(new UniversalNamespaceResolver(SingleTask.currentTask.getXsdFile().getXsdNameSpace()));
NodeList nodes = (NodeList) xpath.evaluate(sb.toString(), doc, XPathConstants.NODESET);
//start for
Node node;
String val = null;
for (int idx = 0; idx < nodes.getLength(); idx++) {
if (nodes.item(idx).getAttributes() != null) {
node = nodes.item(idx).getAttributes().getNamedItem("value");
if (node != null) {
val = node.getNodeValue();
node.setNodeValue(val.replaceAll(oldValue, newValue));
changed = true;
break;
}//end if node is found
}
}//end for
//set the new updated xml doc
SingleTask.currentTask.setDoc(doc);
return changed;
}
Console Output:
Selected path=[C:\Users\anaim\Documents\XsdToXmlFiles\sampleIngest_LDD.xml, Ingest_LDD, Property_Maps, identifier, identifier1]
Constructed XPath Query:/Ingest_LDD/Property_Maps/identifier[text()='identifier1']
I actually manually wrote the XPath query online using (https://www.freeformatter.com/xpath-tester.html#ad-output)
Sorry, I cant provide the sample XMl, its way too large.
The manual XPath query was:
/Ingest_LDD/Property_Maps/identifier[text()='identifier1']
And the online tool successfully found the text & outputted:
Element='<identifier xmlns="http://pds.nasa.gov/pds4/pds/v1">identifier1</identifier>'
Therefore my code under attempt #4 & the query should work?
UPDATED ATTEMPTS AFTER USER INPUT:
Attempt #5 (based on response from user, namespace aware = TRUE ), relevant code is below
factory.setNamespaceAware(true);
doc = dBuilder.parse(xmlFile);
if (doc!=null)
{
//***NOTE program comes meaning doc is NOT null, however inspecting it shows [#document: null]
doc.getDocumentElement().normalize();
}
xpath.setNamespaceContext(new UniversalNamespaceResolver(SingleTask.currentTask.getXsdFile().getXsdNameSpace()));
Node node = (Node) xpath.evaluate(sb.toString(), doc, XPathConstants.NODE);
if (node!=null)
{
// See https://docs.oracle.com/javase/9/docs/api/org/w3c/dom/Node.html#setTextContent-java.lang.String-
node.setTextContent(newValue);
SingleTask.currentTask.setDoc(doc);
}
Output (again unable to find the node):
Selected path=[C:\Users\xyz\Documents\XsdToXmlFiles\sampleIngest_LDD.xml, Ingest_LDD, name, name1]
Constructed XPath Query:/Ingest_LDD/name[text()='name1']
Error changing value!
Attempt #6 (based on response from user, namespace aware = FALSE )
factory.setNamespaceAware(false);
doc = dBuilder.parse(xmlFile);
if (doc!=null)
{
//***NOTE program comes meaning doc is NOT null, however inspecting it shows [#document: null]
doc.getDocumentElement().normalize();
}
//COMMENTED OUT , SINCE NAMESPACE AWARE FALSE xpath.setNamespaceContext(new UniversalNamespaceResolver(SingleTask.currentTask.getXsdFile().getXsdNameSpace()));
Node node = (Node) xpath.evaluate(sb.toString(), doc, XPathConstants.NODE);
if (node!=null)
{
// See https://docs.oracle.com/javase/9/docs/api/org/w3c/dom/Node.html#setTextContent-java.lang.String-
node.setTextContent(newValue);
SingleTask.currentTask.setDoc(doc);
}
Output (again unable to find the node):
Selected path=[C:\Users\xyz\Documents\XsdToXmlFiles\sampleIngest_LDD.xml, Ingest_LDD, name, name1]
Constructed XPath Query:/Ingest_LDD/name[text()='name1']
Error changing value!
The document that is being returned as [#document: null] may not actually be the problem according to(DocumentBuilder.parse(InputStream) returns null)???
Attempt # 7 (namespace aware FALSE)
Also NamedNodeMap namedNodeMap = doc.getAttributes(); returns NULL.
However, Node firstChild = doc.getFirstChild() actually returns valid element!
I passed firstChild to xpath.evaluate(sb.toString(), firstChild , XPathConstants.NODE); but again the node desired node was not found.
Output (again unable to find the node):
Selected path=[C:\Users\xyz\Documents\XsdToXmlFiles\sampleIngest_LDD.xml, Ingest_LDD, name, name1]
Constructed XPath Query:/Ingest_LDD/name[text()='name1']
Error changing value!
Attempt # 8 (namespace aware false)
I also attemped to pass in doc.getChildNodes() to xpath.evaluate() rather than doc object as final desperate atteempt, see snippet below.
if (doc != null) {
NodeList nodes = (NodeList) xpath.evaluate(sb.toString(), doc.getChildNodes(), XPathConstants.NODESET);
String val = null;
Node node;
for (int idx = 0; idx < nodes.getLength(); idx++) {
if (nodes.item(idx).getAttributes() != null) {
node = nodes.item(idx).getAttributes().getNamedItem("value");
if (node != null) {
val = node.getNodeValue();
node.setNodeValue(val.replaceAll(oldValue, newValue));
changed = true;
break;
}//end if node is found
}
}//end for
}
Output (again unable to find the node):
Selected path=[C:\Users\xyz\Documents\XsdToXmlFiles\sampleIngest_LDD.xml, Ingest_LDD, name, name1]
Constructed XPath Query:/Ingest_LDD/name[text()='name1']
Error changing value!
For the test you performed online it seems your XML file contains namespace information.
With that information in mind, probably both of your examples of XPath evaluation would work, or not, dependent on several things.
For example, you probably can use the attempt #4, and the XPath evaluation will be adequate, if you are using a non namespace aware (the default) DocumentBuilderFactory and you o not provide any namespace information in your XPath expression.
But the XPath evaluation in attempt #3 can also be adequate if the inverse conditions apply, i.e., you are using a namespace aware DocumentBuilderFactory:
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setNamespaceAware(true);
and you provide namespace information in your XPath expression and a convenient NamespaceContext implementation. Please, see this related SO question and this great IBM article.
Be aware that you do not need to provide the same namespace prefixes in both your XML file an XPath expression, the only requirement is namespace awareness in XML (XPath is always namespace aware).
Given that conditions, I think you can apply both approaches.
In any case, I think the problem may have to do with the way you are dealing with the actual text replacement: you are looking for a node with a value attribute, and reviewing the associated XML Schema this attribute does not exist.
Please, consider instead the following approach:
// You can get here following both attempts #3 an #4
Node node = (Node) xpath.evaluate(sb.toString(), doc, XPathConstants.NODE);
boolean changed = node != null;
if (changed) {
// See https://docs.oracle.com/javase/9/docs/api/org/w3c/dom/Node.html#setTextContent-java.lang.String-
node.setTextContent(newValue);
SingleTask.currentTask.setDoc(doc);
}
return changed;
This code assumes that the selected node will be unique to work properly.
Although probably unlike, please, be aware that the way in which you are constructing the XPath selector from the JTree model can provide duplicates if you define the same value for repeated elements in your XML. Consider the elements external_id_property_maps in your screenshot, for instance.
In order to avoid that, you can take a different approach when constructing the XPath selector.
It is unclear for your code snippet, but probably you are using DefaultMutableTreeNode as the base JTree node type. If that is the case, you can associate with every node the arbitrary information you need to.
Consider for example the creation of a simple POJO with two fields, the name of the Element that the node represents, and some kind of unique, generated, id, let's name it uid or uuid to avoid confusion with the id attribute, most likely included in the original XML document.
This uid should be associated with every node. Maybe you can take advantage of the JTree creation process and, while processing every node of your XML file, include this attribute as well, generated using the UUID class, for example.
Or you can apply a XSLT transform to the original XML document prior to representation:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:attribute name="uid">
<xsl:value-of select="generate-id(.)"/>
</xsl:attribute>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
With this changes, your XPath query should looks like:
/ns:Ingest_LDD[#uid='w1ab1']/ns:Property_Maps[#uid='w1ab1a']/ns:identifier[#uid='w1ab1aq']
Of course, it will be necessary to modify the code devoted to the construction of this expression from the selected path of the JTree to take the custom object into account.
You can take this approach to the limit and use a single selector based solely in this uid attribute, although I think that for performance reasons it will be not appropriate:
//*[#uid='w1ab1']
Putting it all together, you can try something like the following.
Please, consider this XML file:
<?xml version="1.0" encoding="utf-8" ?>
<Ingest_LDD xmlns="http://pds.nasa.gov/pds4/pds/v1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v1 https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1700.xsd">
<!-- Please, forgive me, I am aware that the document is not XML Schema conformant,
only for exemplification of the default namespace -->
<Property_Maps>
<identifier>identifier1</identifier>
</Property_Maps>
</Ingest_LDD>
First, let's parse the document:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
// As the XML contains namespace, let's configure the parser namespace aware
// This will be of relevance when evaluating XPath
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();
// Parse the document from some source
Document document = builder.parse(...);
// See http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
document.getDocumentElement().normalize();
Now, create a JTree structure corresponding to the input XML file. First, let's create a convenient POJO to store the required tree node information:
public class NodeInformation {
// Node name
private String name;
// Node uid
private String did;
// Node value
private String value;
// Setters and getters
// Will be reused by DefaultMutableTreeNode
#Override
public String toString() {
return this.name;
}
}
Convert the XML file to its JTree counterpart:
// Get a reference to root element
Element rootElement = document.getDocumentElement();
// Create root tree node
DefaultMutableTreeNode rootTreeNode = getNodeInformation(rootElement);
// Traverse DOM
traverse(rootTreeNode, rootElement);
// Create tree and tree model based on the computed root tree node
DefaultTreeModel treeModel = new DefaultTreeModel(rootTreeNode);
JTree tree = new JTree(treeModel);
Where:
private NodeInformation getNodeInformation(Node childElement) {
NodeInformation nodeInformation = new NodeInformation();
String name = childElement.getNodeName();
nodeInformation.setName(name);
// Provide a new unique identifier for every node
String uid = UUID.randomUUID().toString();
nodeInformation.setUid(uid);
// Uhnn.... We need to associate the new uid with the DOM node as well.
// There is nothing wrong with it but mutating the DOM in this way in
// a method that should be "read-only" is not the best solution.
// It would be interesting to study the above-mentioned XSLT approach
chilElement.setAttribute("uid", uid);
// Compute node value
StringBuffer buffer = new StringBuffer();
NodeList childNodes = childElement.getChildNodes();
boolean found = false;
for (int i = 0; i < childNodes.getLength(); i++) {
Node node = childNodes.item(i);
if (node.getNodeType() == Node.TEXT_NODE) {
String value = node.getNodeValue();
buffer.append(value);
found = true;
}
}
if (found) {
nodeInformation.setValue(buffer.toString());
}
}
And:
// Finds all the child elements and adds them to the parent node recursively
private void traverse(DefaultMutableTreeNode parentTreeNode, Node parentXMLElement) {
NodeList childElements = parentXMLElement.getChildNodes();
for(int i=0; i<childElements.getLength(); i++) {
Node childElement = childElements.item(i);
if (childElement.getNodeType() == Node.ELEMENT_NODE) {
DefaultMutableTreeNode childTreeNode =
new DefaultMutableTreeNode
(getNodeInformation(childElement));
parentTreeNode.add(childTreeNode);
traverse(childTreeNode, childElement);
}
}
}
Although the NamespaceContext implementation you provided looks fine, please, at a first step, try something simpler, to minimize the possibility of error. See the provided implementation below.
Then, your updateXMLData method should looks like:
public boolean updateXmlData(JTree tree, org.w3c.dom.Document doc, TreeNode parentNode, String oldValue, String newValue) throws XPathExpressionException {
boolean changed = false;
TreePath selectedPath = tree.getSelectionPath();
int count = getPathCount();
StringBuilder sb = new StringBuilder();
NodeInformation lastNodeInformation;
if (count > 0) {
for (int i = 1; i < trp.getPathCount(); i++) {
DefaultMutableTreeNode treeNode = (DefaultMutableTreeNode) trp.getPathComponent(i);
NodeInformation nodeInformation = (NodeInformation) treeNode.getUserObject();
sb.append(String.format("/ns:%s[#uid='%s']", nodeInformation.getName(), nodeInformation.getUid());
lastNodeInformation = nodeInformation;
}
}
System.out.println("Constructed XPath Query:" + sb.toString());
// Although the `NamespaceContext` implementation you provided looks
// fine, please, at a first step, try something simpler, to minimize the
// possibility of error. For example:
NamespaceContext nsContext = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if (prefix == null) {
throw new IllegalArgumentException("No prefix provided!");
} else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) {
return "http://pds.nasa.gov/pds4/pds/v1";
} else if (prefix.equals("ns")) {
return "http://pds.nasa.gov/pds4/pds/v1";
} else {
return XMLConstants.NULL_NS_URI;
}
}
public String getPrefix(String namespaceURI) {
// Not needed in this context.
return null;
}
public Iterator getPrefixes(String namespaceURI) {
// Not needed in this context.
return null;
}
};
//new xpath instance
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
// As the parser is namespace aware, we can safely use XPath namespaces
xpath.setNamespaceContext(nsContext);
Node node = (Node) xpath.evaluate(sb.toString(), doc, XPathConstants.NODE);
boolean changed = node != null;
if (changed) {
// See https://docs.oracle.com/javase/9/docs/api/org/w3c/dom/Node.html#setTextContent-java.lang.String-
node.setTextContent(newValue);
SingleTask.currentTask.setDoc(doc);
// Probably the information has been updated in the node, but just in case:
lastNodeInformation.setValue(newValue);
}
return changed;
}
The generated XPath expression will look like:
/ns:Ingest_LDD[#uid='w1ab1']/ns:Property_Maps[#uid='w1ab1a']/ns:identifier[#uid='w1ab1aq']
If you want to use the default namespace, you can also try with:
/:Ingest_LDD[#uid='w1ab1']/:Property_Maps[#uid='w1ab1a']/:identifier[#uid='w1ab1aq']
Please, be aware that I haven't tested the code, but I hope you get the idea.
Just for clarification, in order to give you a proper answer, as mentioned before, if you now remove or comment this line of code:
builderFactory.setNamespaceAware(true);
Then, the XPath expression:
/ns:Ingest_LDD[#uid='w1ab1']/ns:Property_Maps[#uid='w1ab1a']/ns:identifier[#uid='w1ab1aq']
will no longer find the required node. Now, if you remove the namespace information from the XPath expression:
/Ingest_LDD[#uid='w1ab1']/Property_Maps[#uid='w1ab1a']/identifier[#uid='w1ab1aq']
It will find the right node again.

Reading comment from XML using DOM parser

When i try to read Comment's from XML file, Comment's from both the element are printing twice, when it pass through the loop. it should print first element comment in first iteration and second element comment in next iteration. If it is not clear, I have attached expected Output and Actual output for reference.
XML Code:
<shipments>
<shipment id="011">
<department>XXXX</department>
<!-- Product: XXXXX-->
</shipment>
</shipments>
Code:
public class Main {
public static void main(String[] args) throws SAXException,
IOException, ParserConfigurationException, XMLStreamException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Ignores all the comments described in the XML File
factory.setIgnoringComments(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("Details.xml"));
doc.getDocumentElement().normalize();
NodeList ShipmentList = doc.getElementsByTagName("shipment");
for (int i = 0; i < ShipmentList.getLength(); i++)
{
Node node = ShipmentList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement = (Element) node;
XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream("shipmentDetails_1.xml"));
while (xr.hasNext()) {
if (xr.next() == XMLStreamConstants.COMMENT) {
String comment = xr.getText();
System.out.print("Comments: ");
System.out.println(comment);
} }
}
}
}
}
Expected Output:
COMMENTS :
Product : Laptop
COMMENTS :
Product : Mobile Phone
Output What i am getting:
Comments: Product:Laptop
Comments: Product:Mobile Phone
Comments: Product:Laptop
Comments: Product:Mobile Phone
To get the values from the XML declaration, call the following methods on the Document:
getXmlEncoding() - An attribute specifying, as part of the XML declaration, the encoding of this document. This is null when unspecified or when it is not known, such as when the Document was created in memory.
getXmlStandalone() - An attribute specifying, as part of the XML declaration, whether this document is standalone. This is false when unspecified.
getXmlVersion() - An attribute specifying, as part of the XML declaration, the version number of this document. If there is no declaration and if this document supports the "XML" feature, the value is "1.0".
UPDATED
To find and print comments inside the <shipment> element, iterate the child nodes of the element and look for nodes of type COMMENT_NODE, cast it to a Comment, and print the value of getData().
for (Node child = node.getFirstChild(); child != null; child = child.getNextSibling()) {
if (child.getNodeType() == Node.COMMENT_NODE) {
Comment comment = (Comment) child;
System.out.println("COMMENTS : " + comment.getData());
}
}
To clarify: The node used here is from the question code. You can also use eElement instead of node. Makes no difference.
To obtain the XML Declaration and comments, I would suggest loading the file as a text file and parsing it via regular expressions. For example:
String file = new String(Files.readAllBytes(Paths.get("shipmentDetails_1.xml")), StandardCharsets.UTF_8);
Pattern pattern = Pattern.compile("<!--([\\s\\S]*?)-->");
Matcher matcher = pattern.matcher(file);
while (matcher.find()) {
System.out.println("COMMENTS: " + matcher.group(1));
}
Pattern pattern2 = Pattern.compile("<\\?xml([\\s\\S]*?)\\?>");
Matcher matcher2 = pattern2.matcher(file);
while (matcher2.find()) {
System.out.println("DECLARATION: " + matcher2.group(1));
}

Match specific html attribute values

I would like to match all attribute values for id, class, name and for! I created a simple function for that task.
private Collection<String> getAttributes(final String htmlContent) {
final Set<String> attributes = new HashSet<>();
final Pattern pattern = Pattern.compile("(class|id|for|name)=\\\"(.*?)\\\"");
final Matcher matcher = pattern.matcher(htmlContent);
while (matcher.find()) {
attributes.add(matcher.group(2));
}
return attributes;
}
Example html content:
<input id="test" name="testName" class="aClass bClass" type="input" />
How can I split html classes via regular expression, so that I get the following result set:
test
testName
aClass
bClass
And is there any way to improve my code? I really don't like the loop.
If you take a look at the JSoup library you can find useful tools for html parsing and manipulation.
For example:
Document doc = ...//create HTML document
Elements htmlElements = doc.children();
htmlElements.traverse(new MyHtmlElementVisitor());
The class MyHtmlElementVisitor simply has to implement NodeVisitor and can access the Node attributes.
Though you might find a good regex for the same job, it has several drawbacks. Just to name a few:
hard to find a failsafe regex for every possible html document
hard to read, therefore difficult to find bugs and implement changes
the regex usually isn't reusable
Don't use regular expressions for parsing HTML. Seriously, it's more complicated than you think.
If your document is actually XHTML, you can use XPath:
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(
"//#*["
+ "local-name()='class'"
+ " or local-name()='id'"
+ " or local-name()='for'"
+ " or local-name()='name'"
+ "]",
new InputSource(new StringReader(htmlContent)),
XPathConstants.NODESET);
int count = nodes.getLength();
for (int i = 0; i < count; i++) {
Collections.addAll(attributes,
nodes.item(i).getNodeValue().split("\\s+"));
}
If it's not XHTML, you can use Swing's HTML parsing:
HTMLEditorKit.ParserCallback callback = new HTMLEditorKit.ParserCallback() {
private final Object[] attributesOfInterest = {
HTML.Attribute.CLASS,
HTML.Attribute.ID,
"for",
HTML.Attribute.NAME,
};
private void addAttributes(AttributeSet attr) {
for (Object a : attributesOfInterest) {
Object value = attr.getAttribute(a);
if (value != null) {
Collections.addAll(attributes,
value.toString().split("\\s+"));
}
}
}
#Override
public void handleStartTag(HTML.Tag tag,
MutableAttributeSet attr,
int pos) {
addAttributes(attr);
super.handleStartTag(tag, attr, pos);
}
#Override
public void handleSimpleTag(HTML.Tag tag,
MutableAttributeSet attr,
int pos) {
addAttributes(attr);
super.handleSimpleTag(tag, attr, pos);
}
};
HTMLDocument doc = (HTMLDocument)
new HTMLEditorKit().createDefaultDocument();
doc.getParser().parse(new StringReader(htmlContent), callback, true);
As for doing it without a loop, I don't think that's possible. But any implementation is going to use one or more loops internally anyway.

DOM4J Parse not returning any child nodes

I am attempting to begin writing a program which uses DOM4j with which I wish to parse a XML file, save it to some tables and finally allow the user to manipulate the data.
Unfortunately I am stuck on the most basic step, the parsing.
Here is the portion of my XML I am attempting to include:
<?xml version="1.0"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.04">
<BkToCstmrDbtCdtNtfctn>
<GrpHdr>
<MsgId>000022222</MsgId>
When I attempt to find the root of my XML it does return the root correctly as "Document". When I attempt to get the child node from Document it also correctly gives me "BkToCstmrDbtCdtNtfctn". The problem is that when I try to go any further and get the child nodes from "Bk" I can't. I get this in the console:
org.dom4j.tree.DefaultElement#2b05039f [Element: <BkToCstmrDbtCdtNtfctn uri: urn:iso:std:iso:20022:tech:xsd:camt.054.001.04 attributes: []/>]
Here is my code, I would appreciate any feedback. Ultimately I want to get the "MsgId" attribute back but in general I just want to figure how to parse deeper into the XML because in reality it probably has about 25 layers.
public static Document getDocument(final String xmlFileName){
Document document = null;
SAXReader reader = new SAXReader();
try{
document = reader.read(xmlFileName);
}
catch (DocumentException e)
{
e.printStackTrace();
}
return document;
}
public static void main(String args[]){
String xmlFileName = "C:\\Users\\jhamric\\Desktop\\Camt54.xml";
String xPath = "//Document";
Document document = getDocument(xmlFileName);
Element root = document.getRootElement();
List<Node> nodes = document.selectNodes(xPath);
for(Iterator i = root.elementIterator(); i.hasNext();){
Element element = (Element) i.next();
System.out.println(element);
}
for(Iterator i = root.elementIterator("BkToCstmrDbtCdtNtfctn");i.hasNext();){
Element bk = (Element) i.next();
System.out.println(bk);
}
}
}
The best approach is probably to use XPath, but since the XML document uses namespaces, you cannot use the "simple" selectNodes methods in the API. I would create a helper method to easily evaluate any XPath expression on either the Document or the Element level:
public static void main(String[] args) throws Exception {
Document doc = getDocument(...);
Map<String, String> namespaceContext = new HashMap<>();
namespaceContext.put("ns", "urn:iso:std:iso:20022:tech:xsd:camt.054.001.04");
// Select the first GrpHdr element in document order
Element element = (Element) select("//ns:GrpHdr[1]", doc, namespaceContext);
System.out.println(element.asXML());
// Select the text content of the MsgId element
Text msgId = (Text) select("./ns:MsgId/text()", element, namespaceContext);
System.out.println(msgId.getText());
}
static Object select(String expression, Branch contextNode, Map<String, String> namespaceContext) {
XPath xp = contextNode.createXPath(expression);
xp.setNamespaceURIs(namespaceContext);
return xp.evaluate(contextNode);
}
Note that the XPath expression must use namespace prefixes that is mapped to the namespace URIs used in the input document, but that the actual value of the prefix doesn't matter.

using xpath in java to go through this list?

i have an xml file that contains lots of different nodes. some in particularly are nested like this:
<emailAddresses>
<emailAddress>
<value>sambj1981#gmail.com</value>
<typeSource>WORK</typeSource>
<typeUser></typeUser>
<primary>false</primary>
</emailAddress>
<emailAddress>
<value>sambj#hotmail.co.uk</value>
<typeSource>HOME</typeSource>
<typeUser></typeUser>
<primary>true</primary>
</emailAddress>
</emailAddresses>
From the above node, what i want to do is go through each and get the values inside it(value, typeSource, typeUser etc) and put them in a POJO.
i tried to see if i can use this xpath expression "//emailAddress" but it doesnt return me the tags inside inside it. maybe i am doing it wrong. i am pretty new to using xpath.
i could do something like this:
//emailAddress/value | //emailAddress/typeSource | .. but doing that will list all elements values together if im not mistaken leaving me to work out when i have finished reading from a specific emailAddress tag and going to the next emailAddress tag.
well to sum up my needs i basically want this to be returned similar to how you would return results from a bog standard sql query that returns results in a row. i.e. if your sql query produces 10 emailAddress, it will return each emailAddress in a row and i can simply iterate over "each emailAddress" and get the appropriate value based on the colunm name or index.
No,
//emailAddress
doesn't return the tags inside, that is correct. What it does return is a NodeList/NodeSet. To actually get the values you can do something like this:
String emailpath = "//emailAddress";
String emailvalue = ".//value";
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
Document document;
public XpathStuff(String file) throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = docFactory.newDocumentBuilder();
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
document = builder.parse(bis);
NodeList nodeList = getNodeList(document, emailpath);
for(int i = 0; i < nodeList.getLength(); i++){
System.out.println(getValue(nodeList.item(i), emailvalue));
}
bis.close();
}
public NodeList getNodeList(Document doc, String expr) {
try {
XPathExpression pathExpr = xpath.compile(expr);
return (NodeList) pathExpr.evaluate(doc, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return null;
}
//extracts the String value for the given expression
private String getValue(Node n, String expr) {
try {
XPathExpression pathExpr = xpath.compile(expr);
return (String) pathExpr.evaluate(n,
XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return null;
}
Maybe I should point out that when iterating over the Nodelist, in .//values the first dot means the current context. Without the dot you would get the first node all the time.
//emailAddress/*
will get these nodes in the document order.
It depends on how you want to iterate through the nodes. We do all our XML using XOM (http://www.xom.nu/) which is an easy reliable Java package. It's possible to write your own strategy using XOM calls.
If you use XStream you can set it up quite easily. Like so:
#XStreamAlias( "EmailAddress" )
public class EmailAddress {
#XStreamAlias()
private String value;
#XStreamAlias()
private String typeSource;
#XStreamAlias()
private String typeUser;
#XStreamAlias()
private boolean primary;
// ... the rest omitted for brevity
}
You then marshal & unmarshal quite simply like so:
XStream xstream = new XStream();
xstream.processAnnotations( EmailAddress.class );
xstream.toXML( /* Object value here */ emailAddress );
xstream.fromXML( /* String xml value here */ "" );
IDK if you have to use XPath or not, but if not I'd consider an out of the box solution like this.
I am totally aware this is not what you were asking for, but may consider using jibx. This is a tool for human-readable XML to POJO mapping.
So I believe you could generate mapping for your email structure in a quick way and let the jibx do the work for you.

Categories

Resources