Java XML Document XPath with Disable Escaping - java

Currently, I need to get the element of XML without escaping.
For example:
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<Header>H001</Header>
<Body>
<Item>ABC&amp;ABC&quot;</Item>
</Body>
</Message>
I need to get the value of "Item" element via XPath.
However, it is escaped automatically.
My Result = ABC&ABC"
Expected = ABC&amp;ABC&quot;
How can I get the expected value?

XPath will always return the values of nodes that result from XML parsing. The string value of the Item element in your XML, after parsing, is ABC&ABC", so that's what XPath gives you. If you want ABC&amp;ABC&quot; then you will have to reverse the action of the XML parser - this is known as serialization. Parsing "unescapes" entity and character references (it turns & into &). Serialization escapes special characters such as "&" (it turns & into &).

Put content surrounded by CDATA.
Note: Charater data (CDATA) will tell the parser to send the text as regular text (no markup) without parsing.
For example :
abc.xml
<?xml version="1.0" encoding="UTF-8"?>
<Messages>
<Message>
<Header>H001</Header>
<Body>
<Item><![CDATA[ABC&&ABC&quot;]]></Item>
</Body>
</Message>
</Messages>
Java code :
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("abc.xml");
Document doc = builder.parse(input);
doc.getDocumentElement().normalize();
NodeList list = doc.getElementsByTagName("Message");
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
NodeList children = node.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
node = children.item(j);
System.out.println(node.getTextContent().trim());
}
}
}
}
Output :
H001
ABC&&ABC&quot;

Related

How to get relative depth of XML element using XPATH

I am trying to find the relative depth of given XML element from specific element in the given XML file, I tried to use XPATH but I'm not very familiar with XML parsing and I'm not getting the desired result. I need as well to ignore the data elements while counting.
Below is the code that I have written and the sample XML file.
E.g. the depth of NM109_BillingProviderIdentifier from TS837_2000A_Loop element is 4.
The parent nodes are: TS837_2000A_Loop < NM1_SubLoop_2 < TS837_2010AA_Loop < NM1_BillingProviderName
as NM109_BillingProviderIdentifier is a child of NM1_BillingProviderName and thus the relative depth of NM1_BillingProviderName from TS837_2000A_Loop is 4 (including TS837_2000A_Loop).
package com.xmlexamples;
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class XmlParser {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("D://sample.xml")));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
expression = "count(NM109_BillingProviderIdentifier/preceding-sibling::TS837_2000A_Loop)+1";
Double d = (Double) xpath.compile(expression).evaluate(doc, XPathConstants.NUMBER);
System.out.println("position from TS837_2000A_Loop " + d);
}
}
<?xml version='1.0' encoding='UTF-8'?>
<X12_00501_837_P xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<TS837_2000A_Loop>
<NM1_SubLoop_2>
<TS837_2010AA_Loop>
<NM1_BillingProviderName>
<NM103_BillingProviderLastorOrganizationalName>VNA of Cape Cod</NM103_BillingProviderLastorOrganizationalName>
<NM109_BillingProviderIdentifier>1487651915</NM109_BillingProviderIdentifier>
</NM1_BillingProviderName>
<N3_BillingProviderAddress>
<N301_BillingProviderAddressLine>8669 NORTHWEST 36TH ST </N301_BillingProviderAddressLine>
</N3_BillingProviderAddress>
</TS837_2010AA_Loop>
</NM1_SubLoop_2>
</TS837_2000A_Loop>
</X12_00501_837_P>
The pivotal method for getting the depth of any node is by counting its ancestors (which include the parent, the parent of the parent etc):
count(NM109_BillingProviderIdentifier/ancestor-or-self::*)
This will give you the count up to the root. To get the relative count, i.e. from anything other than the root, assuming the names do not overlap, you can do this:
count(NM109_BillingProviderIdentifier/ancestor-or-self::*)
- count(NM109_BillingProviderIdentifier/ancestor::TS837_2000A_Loop/ancestor::*)
Depending on whether the current, or the base element should be included in the count, use the ancestor-or-self or ancestor axis.
PS: you should probably thank Pietro Saccardi for so kindly making your post and your huge (4kB on one line..) sample XML readable.

Java XML pull data on specific node

I am trying to pull all data by searching for a specific node. My code below can only print out all the nodes but what if I only want to pull information on pantone 101 for example and print out all of the other nodes such as the colors within that specific pantone 101 node. Here is the code I've created to print out the XML data, how can I edit this to only print a specific node. Thanks!
<inventory>
<Product pantone="100" blue="7.4" red="35" green="24"> </Product>
<Product pantone="101" blue="5.4" red="3" rubine="35" purple="24"> </Product>
<Product pantone="102" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="103" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="104" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="105" orange="5.4" purple="35" white="24"> </Product>
<Product pantone="106" black="5.4" rubine="35" white="24" purple="35" orange="5.4"> </Product>
</inventory>
//
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
public class ReadXML {
public static void main(String args[]) throws Exception {
DocumentBuilderFactory buildFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder dBuilder = buildFactory.newDocumentBuilder();
Document document = dBuilder.parse(ReadXML.class.getResourceAsStream("data.xml"));
document.normalize();
//get main node
NodeList rootNodes = document.getElementsByTagName("inventory");
Node rootNode = rootNodes.item(0);
Element rootElement = (Element) rootNode;
//print all with specific tag
NodeList inventoryList = rootElement.getElementsByTagName("Product");
for(int i = 0; i < inventoryList.getLength(); i++){
Node pantone = inventoryList.item(i);
Element pantoneElement = (Element) pantone;
//remove blank elements
System.out.println("Pantone: " + pantoneElement.getAttribute("pantone")); // print attribute
System.out.println("Blue: " + pantoneElement.getAttribute("blue"));
System.out.println("Red: " + pantoneElement.getAttribute("red"));
System.out.println("Orange: " + pantoneElement.getAttribute("orange"));
System.out.println("White: " + pantoneElement.getAttribute("white"));
System.out.println("Purple: " + pantoneElement.getAttribute("purple"));
System.out.println("Green: " + pantoneElement.getAttribute("green"));
System.out.println("Black: " + pantoneElement.getAttribute("black"));
System.out.println("Rubine: " + pantoneElement.getAttribute("rubine"));
}
} catch (ParserConfigurationException | SAXException | IOException e) {
}
}
}
You can use XPath to get data from XML document using more advanced criteria. For example, the following is XPath expression that will get <Product> element having pantone attribute equals 101 :
/inventory/Product[#pantone=101]
alternatively, just don't mention /inventory in the XPath if you plan to call the XPath on <inventory> element as the context :
Product[#pantone=101]
I'm not familiar with Java, but this post should give you a good hint : How to read XML using XPath in Java

Extract a block of XML into another XML file in java

I have a XML file called word.xml containing
<A>
<Answer>How was you day</Answer>
<Question>Happy day </Question>
<Biased> good morning </Biased>
<abc>..............</abc>
.
. // few more tags here
.
</A>
Now i want to extract another XML file called word1.xml containing part of word1.xml
<A>
<Answer>How was you day</Answer>
<Question>Happy day </Question>
</A>
Java Code which I tried so far
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ReadXMLFile {
public static void main(String args[]) {
try {
File stocks = new File("word.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(stocks);
doc.getDocumentElement().normalize();
System.out.println("root of xml file" + doc.getDocumentElement().getNodeName());
NodeList nodes = doc.getElementsByTagName("A");
System.out.println("==========================");
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
System.out.println("i value---"+i);
System.out.println(nodes.getLength());
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
System.out.println(element.getTextContent());
//element.getElementsByTagName(name)
File statText = new File(i+".txt");
FileOutputStream is = new FileOutputStream(statText);
OutputStreamWriter osw = new OutputStreamWriter(is);
Writer w = new BufferedWriter(osw);
w.write("<Answer>");
w.write(element.getElementsByTagName("Answer").item(0).getTextContent());
w.write("</Answer>");
w.write("Question");
w.write(element.getElementsByTagName("Question").item(0).getTextContent());
w.write("</Question>");
w.close();
}
}
}
catch (Exception ex) {
ex.printStackTrace();
}
private static String getValue(String tag, Element element) {
NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
Node node = (Node) nodes.item(0);
return node.getNodeValue();
}
}
}
I just want to include tags in my results. This is the DIRTY way of doing. Can you please suggest me the best way.Need help. Thanks in advance.
If Java is not a mandatory constraint here, you can achieve this by using XSLT. It's pretty easy to follow. You can find some guidance here: Link
An example of my own practice:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//title">
<article>
<title>
<xsl:value-of select="./name/>
<xsl:text> : </xsl:text>
<xsl:value-of select = "./number/>
</title>
<references>
<xsl:value-of select = "reference"/>
</references>
</article>
</xsl:for-each>
</xsl:template>
Hope it helps!
Just like BeginnerJava explained XSL is the most appropriate technology here as you are transforming one XML tree to another XML tree and XSL is meant for that.
in XSL the code needed to achieve what you describe would be (I skipped some bits):
<xsl:template match="A">
<xsl:copy>
<xsl:apply-templates select="Answer|Question"/>
</xsl:copy>
</xsl:template>
You can invoke XSL transfomation from you Java code or from command line like this:
java net.sf.saxon.Transform [options] source-document stylesheet [ params…]
Parse the xml into DOM using a DocumentParser. remove the elements that you don't need from the resulting Document. write the modified Document to a new File using a Transformer. (note, the details for each of these steps can be found in any of the thousands of java xml tutorials online).

xpath not accepting this expression

hello everyone i have on question for xpath
/abcd/nsanity/component_details[#component="ucs"]/command_details[<*configScope inHierarchical="true" cookie="{COOKIE}" dn="org-root" */>]/collected_data
i want to retrieve the string for above the xpath statement but when i am giving this xpath to xpath expression for evaulate it is throwing an exception like
Caused by: javax.xml.transform.TransformerException: A location path was expected, but the following token was encountered: <configScope
The bold part in your XPath expression is not a valid predicate expression. I can only guess, what do you want to achieve. If you want only the <command_details/> elements, which have a <configScope/> child element with attributes set to inHierarchical="true", cookie="{COOKIE}" and dn="org-root" then the XPath expression should be:
/abcd/nsanity/component_details[#component='ucs']/command_details[configScope[#inHierarchical='true' and #cookie='{COOKIE}' and #dn='org-root']]/collected_data
Here is an example XML:
<abcd>
<nsanity>
<component_details component="ucs">
<command_details>
<configScope inHierarchical="true" cookie="{COOKIE}" dn="org-root" />
<collected_data>Yes</collected_data>
</command_details>
<command_details>
<configScope inHierarchical="true" cookie="{COOKIE}" dn="XXX"/>
<collected_data>No</collected_data>
</command_details>
</component_details>
</nsanity>
</abcd>
The following Java program reads the XML file test.xml and evaluates the XPath expression (and prints the text node of element <collected_data/>.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class Test {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document = dbf.newDocumentBuilder().parse("test.xml");
XPath xpath = XPathFactory.newInstance().newXPath() ;
NodeList nl = (NodeList) xpath.evaluate("/abcd/nsanity/component_details[#component='ucs']/command_details[configScope[#inHierarchical='true' and #cookie='{COOKIE}' and #dn='org-root']]/collected_data", document, XPathConstants.NODESET);
for(int i = 0; i < nl.getLength(); i++) {
Element el = (Element) nl.item(i);
System.out.println(el.getTextContent());
}
}
}

Why does my XPath expression in Java return too many children?

I have the following xml file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<config>
<a>
<b>
<param>p1</param>
<param>p2</param>
</b>
</a>
</config>
and the xpath code to get my node params:
Document doc = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/config/a/b");
Object o = expr.evaluate(doc, XPathConstants.NODESET);
NodeList list = (NodeList) o;
but it turns out that the nodes list (list) has 5 children, including "\t\n", instead of just two. Is there something wrong with my code? How can I just get my two nodes?
Thank you!
When you select /config/a/b/, you are selecting all children of b, which includes three text nodes and two elements. That is, given your XML above and only showing the fragment in question:
<b>
<param>p1</param>
<param>p2</param>
</b>
the first child is the text (whitespace) following <b> and preceding <param>p1 .... The second child is the first param element. The third child is the text (whitespace) between the two param elements. And so on. The whitespace isn't ignored in XML, although many forms of processing XML ignore it.
You have a couple choices:
Change your xpath expression so it will only select element nodes, as suggested by Ted Dziuba, or
Loop over the five nodes returned and only select the non-text nodes.
You could do something like this:
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() != Node.TEXT_NODE) {
System.out.println(nodes.item(i).getNodeValue());
}
}
You can use the node type to select only element nodes, or to remove text nodes.
so the xpath looks like:
/config/a/b/*/text().
And the output for :
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
would be as expected: p1 and p2
How about
/config/a/b/*/text()/..
?
import org.w3c.dom.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import java.io.IOException;
import org.xml.sax.SAXException;
public class TestClient_XPath {
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("yourfile.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xpath.compile("/a/b/c");
Object res = xPathExpression.evaluate(doc);
System.out.println(res.toString());
}
}
Xalan and Xerces appear to be embedded in rt.jar.
Don't include xerces and xalan libs.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4624775
I am not sure but shouldn't /config/a/b just return b? /config/a/b/param should return the two param nodes...
Could the view on the problem be the problem? Of course you get back the resulting node AND all its children. So you just have to look at the first element and not at its children.
But I can be totally wrong, because I am usually just use Xpath to navigate on DOM trees (HtmlUnit).

Categories

Resources