I am trying to find the relative depth of given XML element from specific element in the given XML file, I tried to use XPATH but I'm not very familiar with XML parsing and I'm not getting the desired result. I need as well to ignore the data elements while counting.
Below is the code that I have written and the sample XML file.
E.g. the depth of NM109_BillingProviderIdentifier from TS837_2000A_Loop element is 4.
The parent nodes are: TS837_2000A_Loop < NM1_SubLoop_2 < TS837_2010AA_Loop < NM1_BillingProviderName
as NM109_BillingProviderIdentifier is a child of NM1_BillingProviderName and thus the relative depth of NM1_BillingProviderName from TS837_2000A_Loop is 4 (including TS837_2000A_Loop).
package com.xmlexamples;
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class XmlParser {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("D://sample.xml")));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
expression = "count(NM109_BillingProviderIdentifier/preceding-sibling::TS837_2000A_Loop)+1";
Double d = (Double) xpath.compile(expression).evaluate(doc, XPathConstants.NUMBER);
System.out.println("position from TS837_2000A_Loop " + d);
}
}
<?xml version='1.0' encoding='UTF-8'?>
<X12_00501_837_P xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<TS837_2000A_Loop>
<NM1_SubLoop_2>
<TS837_2010AA_Loop>
<NM1_BillingProviderName>
<NM103_BillingProviderLastorOrganizationalName>VNA of Cape Cod</NM103_BillingProviderLastorOrganizationalName>
<NM109_BillingProviderIdentifier>1487651915</NM109_BillingProviderIdentifier>
</NM1_BillingProviderName>
<N3_BillingProviderAddress>
<N301_BillingProviderAddressLine>8669 NORTHWEST 36TH ST </N301_BillingProviderAddressLine>
</N3_BillingProviderAddress>
</TS837_2010AA_Loop>
</NM1_SubLoop_2>
</TS837_2000A_Loop>
</X12_00501_837_P>
The pivotal method for getting the depth of any node is by counting its ancestors (which include the parent, the parent of the parent etc):
count(NM109_BillingProviderIdentifier/ancestor-or-self::*)
This will give you the count up to the root. To get the relative count, i.e. from anything other than the root, assuming the names do not overlap, you can do this:
count(NM109_BillingProviderIdentifier/ancestor-or-self::*)
- count(NM109_BillingProviderIdentifier/ancestor::TS837_2000A_Loop/ancestor::*)
Depending on whether the current, or the base element should be included in the count, use the ancestor-or-self or ancestor axis.
PS: you should probably thank Pietro Saccardi for so kindly making your post and your huge (4kB on one line..) sample XML readable.
Related
Currently, I need to get the element of XML without escaping.
For example:
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<Header>H001</Header>
<Body>
<Item>ABC&ABC"</Item>
</Body>
</Message>
I need to get the value of "Item" element via XPath.
However, it is escaped automatically.
My Result = ABC&ABC"
Expected = ABC&ABC"
How can I get the expected value?
XPath will always return the values of nodes that result from XML parsing. The string value of the Item element in your XML, after parsing, is ABC&ABC", so that's what XPath gives you. If you want ABC&ABC" then you will have to reverse the action of the XML parser - this is known as serialization. Parsing "unescapes" entity and character references (it turns & into &). Serialization escapes special characters such as "&" (it turns & into &).
Put content surrounded by CDATA.
Note: Charater data (CDATA) will tell the parser to send the text as regular text (no markup) without parsing.
For example :
abc.xml
<?xml version="1.0" encoding="UTF-8"?>
<Messages>
<Message>
<Header>H001</Header>
<Body>
<Item><![CDATA[ABC&&ABC"]]></Item>
</Body>
</Message>
</Messages>
Java code :
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("abc.xml");
Document doc = builder.parse(input);
doc.getDocumentElement().normalize();
NodeList list = doc.getElementsByTagName("Message");
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
NodeList children = node.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
node = children.item(j);
System.out.println(node.getTextContent().trim());
}
}
}
}
Output :
H001
ABC&&ABC"
Please I need to get any occurrence of a div tag in a String value in Java.
The location of this div tag in the String is not known ahead.
Can anyone please help with any hint or link that an be helpful regarding this?
Thanks.
import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<div><![CDATA[<p>hellowrold</p>]]></div>";
String xpath = "/div/text()";
XPath xPath = XPathFactory.newInstance().newXPath();
String childHtml = xPath.evaluate(xpath, new InputSource(new StringReader(xml)));
System.out.println(childHtml);
xpath = "/p";
String result = xPath.evaluate(xpath, new InputSource(new StringReader(childHtml)));
System.out.println(result);
}
}
Xpath is used to traverse XML. You can do much more with XPath. In your case the problem is with CDATA tags that are hiding the elements which you want to extract. The above should help you get away with that. What has been done above ? first extract the CDATA tags as separate xml and then apply xpath on it.
hello everyone i have on question for xpath
/abcd/nsanity/component_details[#component="ucs"]/command_details[<*configScope inHierarchical="true" cookie="{COOKIE}" dn="org-root" */>]/collected_data
i want to retrieve the string for above the xpath statement but when i am giving this xpath to xpath expression for evaulate it is throwing an exception like
Caused by: javax.xml.transform.TransformerException: A location path was expected, but the following token was encountered: <configScope
The bold part in your XPath expression is not a valid predicate expression. I can only guess, what do you want to achieve. If you want only the <command_details/> elements, which have a <configScope/> child element with attributes set to inHierarchical="true", cookie="{COOKIE}" and dn="org-root" then the XPath expression should be:
/abcd/nsanity/component_details[#component='ucs']/command_details[configScope[#inHierarchical='true' and #cookie='{COOKIE}' and #dn='org-root']]/collected_data
Here is an example XML:
<abcd>
<nsanity>
<component_details component="ucs">
<command_details>
<configScope inHierarchical="true" cookie="{COOKIE}" dn="org-root" />
<collected_data>Yes</collected_data>
</command_details>
<command_details>
<configScope inHierarchical="true" cookie="{COOKIE}" dn="XXX"/>
<collected_data>No</collected_data>
</command_details>
</component_details>
</nsanity>
</abcd>
The following Java program reads the XML file test.xml and evaluates the XPath expression (and prints the text node of element <collected_data/>.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class Test {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document = dbf.newDocumentBuilder().parse("test.xml");
XPath xpath = XPathFactory.newInstance().newXPath() ;
NodeList nl = (NodeList) xpath.evaluate("/abcd/nsanity/component_details[#component='ucs']/command_details[configScope[#inHierarchical='true' and #cookie='{COOKIE}' and #dn='org-root']]/collected_data", document, XPathConstants.NODESET);
for(int i = 0; i < nl.getLength(); i++) {
Element el = (Element) nl.item(i);
System.out.println(el.getTextContent());
}
}
}
I want to modify an existing XML file using xPath. If the node doesn't exist, it should be created (along with it's parents if neccessary). An example:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<param0>true</param0>
<param1>1.0</param1>
</configuration>
And here are a couple of xPaths I want to insert/modify:
/configuration/param1/text() -> 4.0
/configuration/param2/text() -> "asdf"
/configuration/test/param3/text() -> true
The XML file should look like this afterwards:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<param0>true</param0>
<param1>4.0</param1>
<param2>asdf</param2>
<test>
<param3>true</param3>
</test>
</configuration>
I tried this:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
try {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
Document doc = domFactory.newDocumentBuilder().parse(file.getAbsolutePath());
XPath xpath = XPathFactory.newInstance().newXPath();
String xPathStr = "/configuration/param1/text()";
Node node = ((NodeList) xpath.compile(xPathStr).evaluate(doc, XPathConstants.NODESET)).item(0);
System.out.printf("node value: %s\n", node.getNodeValue());
node.setNodeValue("4.0");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(new DOMSource(doc), new StreamResult(file));
} catch (Exception e) {
e.printStackTrace();
}
The node is changed in the file after running this code. Exactly what I wanted. But if I use one of the below paths, node is null (and therefore a NullPointerException is thrown):
/configuration/param2/text()
/configuration/test/param3/text()
How can I change this code so that the node (and non existing parent nodes as well) are created?
EDIT: Ok, to clarify: I have a set of parameters that I want to save to XML. During development, this set can change (some parameters get added, some get moved, some get removed). So I basically want to have a function to write the current set of parameters to an already existing file. It should override the parameters that already exist in the file, add new parameters and leave old parameters in there.
The same for reading, I could just have the xPath or some other coordinates and get the value from the XML. If it doesn't exist, it returns the empty string.
I don't have any constraints on how to implement it, xPath, DOM, SAX, XSLT... It should just be easy to use once the functionality is written (like BeniBela's solution).
So if I have the following parameters to set:
/configuration/param1/text() -> 4.0
/configuration/param2/text() -> "asdf"
/configuration/test/param3/text() -> true
the result should be the starting XML + those parameters. If they already exist at that xPath, they get replaced, otherwise they get inserted at that point.
If you want a solution without dependencies, you can do it with just DOM and without XPath/XSLT.
Node.getChildNodes|getNodeName / NodeList.* can be used to find the nodes, and Document.createElement|createTextNode, Node.appendChild to create new ones.
Then you can write your own, simple "XPath" interpreter, that creates missing nodes in the path like that:
public static void update(Document doc, String path, String def){
String p[] = path.split("/");
//search nodes or create them if they do not exist
Node n = doc;
for (int i=0;i < p.length;i++){
NodeList kids = n.getChildNodes();
Node nfound = null;
for (int j=0;j<kids.getLength();j++)
if (kids.item(j).getNodeName().equals(p[i])) {
nfound = kids.item(j);
break;
}
if (nfound == null) {
nfound = doc.createElement(p[i]);
n.appendChild(nfound);
n.appendChild(doc.createTextNode("\n")); //add whitespace, so the result looks nicer. Not really needed
}
n = nfound;
}
NodeList kids = n.getChildNodes();
for (int i=0;i<kids.getLength();i++)
if (kids.item(i).getNodeType() == Node.TEXT_NODE) {
//text node exists
kids.item(i).setNodeValue(def); //override
return;
}
n.appendChild(doc.createTextNode(def));
}
Then, if you only want to update text() nodes, you can use it as:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
Document doc = domFactory.newDocumentBuilder().parse(file.getAbsolutePath());
update(doc, "configuration/param1", "4.0");
update(doc, "configuration/param2", "asdf");
update(doc, "configuration/test/param3", "true");
Here is a simple XSLT solution:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="param1/text()">4.0</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
<param2>asdf</param2>
<test><param3>true</param3></test>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<configuration>
<param0>true</param0>
<param1>1.0</param1>
</configuration>
the wanted, correct result is produced:
<configuration>
<param0>true</param0>
<param1>4.0</param1>
<param2>asdf</param2>
<test><param3>true</param3></test>
</configuration>
Do Note:
An XSLT transformation never "updates in-place". It always creates a new result tree. Therefore, if one wants to modify the same file, typically the result of the transformation is saved under another name, then the original file is deleted and the result is renamed to have the original name.
I've created a small project for using XPATH to create/update XML: https://github.com/shenghai/xmodifier
the code to change your xml is like:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(xmlfile);
XModifier modifier = new XModifier(document);
modifier.addModify("/configuration/param1", "asdf");
modifier.addModify("/configuration/param2", "asdf");
modifier.addModify("/configuration/test/param3", "true");
modifier.modify();
I would point you to a new/novel way of doing what you described, by using VTD-XML... there are numerous reasons why VTD-XML is far better than all other solutions provided for this question... here are a few links ...
Simplify XML processing with vtd-xml
Manipulate XML the Ximple Way
Processing XML with Java – A Performance Benchmark
dfs
import com.ximpleware.*;
import java.io.*;
public class modifyXML {
public static void main(String[] s) throws VTDException, IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/configuration/param1/text()");
XMLModifier xm = new XMLModifier(vn);
// using XPath
int i=ap.evalXPath();
if(i!=-1){
xm.updateToken(i, "4.0");
}
String s1 ="<param2>asdf</param2>/n<test>/n<param3>true</param3>/n</test>";
xm.insertAfterElement(s1);
xm.output("output.xml");
}
}
I have the following xml file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<config>
<a>
<b>
<param>p1</param>
<param>p2</param>
</b>
</a>
</config>
and the xpath code to get my node params:
Document doc = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/config/a/b");
Object o = expr.evaluate(doc, XPathConstants.NODESET);
NodeList list = (NodeList) o;
but it turns out that the nodes list (list) has 5 children, including "\t\n", instead of just two. Is there something wrong with my code? How can I just get my two nodes?
Thank you!
When you select /config/a/b/, you are selecting all children of b, which includes three text nodes and two elements. That is, given your XML above and only showing the fragment in question:
<b>
<param>p1</param>
<param>p2</param>
</b>
the first child is the text (whitespace) following <b> and preceding <param>p1 .... The second child is the first param element. The third child is the text (whitespace) between the two param elements. And so on. The whitespace isn't ignored in XML, although many forms of processing XML ignore it.
You have a couple choices:
Change your xpath expression so it will only select element nodes, as suggested by Ted Dziuba, or
Loop over the five nodes returned and only select the non-text nodes.
You could do something like this:
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() != Node.TEXT_NODE) {
System.out.println(nodes.item(i).getNodeValue());
}
}
You can use the node type to select only element nodes, or to remove text nodes.
so the xpath looks like:
/config/a/b/*/text().
And the output for :
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
would be as expected: p1 and p2
How about
/config/a/b/*/text()/..
?
import org.w3c.dom.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import java.io.IOException;
import org.xml.sax.SAXException;
public class TestClient_XPath {
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("yourfile.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xpath.compile("/a/b/c");
Object res = xPathExpression.evaluate(doc);
System.out.println(res.toString());
}
}
Xalan and Xerces appear to be embedded in rt.jar.
Don't include xerces and xalan libs.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4624775
I am not sure but shouldn't /config/a/b just return b? /config/a/b/param should return the two param nodes...
Could the view on the problem be the problem? Of course you get back the resulting node AND all its children. So you just have to look at the first element and not at its children.
But I can be totally wrong, because I am usually just use Xpath to navigate on DOM trees (HtmlUnit).