hello everyone i have on question for xpath
/abcd/nsanity/component_details[#component="ucs"]/command_details[<*configScope inHierarchical="true" cookie="{COOKIE}" dn="org-root" */>]/collected_data
i want to retrieve the string for above the xpath statement but when i am giving this xpath to xpath expression for evaulate it is throwing an exception like
Caused by: javax.xml.transform.TransformerException: A location path was expected, but the following token was encountered: <configScope
The bold part in your XPath expression is not a valid predicate expression. I can only guess, what do you want to achieve. If you want only the <command_details/> elements, which have a <configScope/> child element with attributes set to inHierarchical="true", cookie="{COOKIE}" and dn="org-root" then the XPath expression should be:
/abcd/nsanity/component_details[#component='ucs']/command_details[configScope[#inHierarchical='true' and #cookie='{COOKIE}' and #dn='org-root']]/collected_data
Here is an example XML:
<abcd>
<nsanity>
<component_details component="ucs">
<command_details>
<configScope inHierarchical="true" cookie="{COOKIE}" dn="org-root" />
<collected_data>Yes</collected_data>
</command_details>
<command_details>
<configScope inHierarchical="true" cookie="{COOKIE}" dn="XXX"/>
<collected_data>No</collected_data>
</command_details>
</component_details>
</nsanity>
</abcd>
The following Java program reads the XML file test.xml and evaluates the XPath expression (and prints the text node of element <collected_data/>.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class Test {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document = dbf.newDocumentBuilder().parse("test.xml");
XPath xpath = XPathFactory.newInstance().newXPath() ;
NodeList nl = (NodeList) xpath.evaluate("/abcd/nsanity/component_details[#component='ucs']/command_details[configScope[#inHierarchical='true' and #cookie='{COOKIE}' and #dn='org-root']]/collected_data", document, XPathConstants.NODESET);
for(int i = 0; i < nl.getLength(); i++) {
Element el = (Element) nl.item(i);
System.out.println(el.getTextContent());
}
}
}
Related
Currently, I need to get the element of XML without escaping.
For example:
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<Header>H001</Header>
<Body>
<Item>ABC&ABC"</Item>
</Body>
</Message>
I need to get the value of "Item" element via XPath.
However, it is escaped automatically.
My Result = ABC&ABC"
Expected = ABC&ABC"
How can I get the expected value?
XPath will always return the values of nodes that result from XML parsing. The string value of the Item element in your XML, after parsing, is ABC&ABC", so that's what XPath gives you. If you want ABC&ABC" then you will have to reverse the action of the XML parser - this is known as serialization. Parsing "unescapes" entity and character references (it turns & into &). Serialization escapes special characters such as "&" (it turns & into &).
Put content surrounded by CDATA.
Note: Charater data (CDATA) will tell the parser to send the text as regular text (no markup) without parsing.
For example :
abc.xml
<?xml version="1.0" encoding="UTF-8"?>
<Messages>
<Message>
<Header>H001</Header>
<Body>
<Item><![CDATA[ABC&&ABC"]]></Item>
</Body>
</Message>
</Messages>
Java code :
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream input = Thread.currentThread().getContextClassLoader().getResourceAsStream("abc.xml");
Document doc = builder.parse(input);
doc.getDocumentElement().normalize();
NodeList list = doc.getElementsByTagName("Message");
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
NodeList children = node.getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
node = children.item(j);
System.out.println(node.getTextContent().trim());
}
}
}
}
Output :
H001
ABC&&ABC"
I am trying to find the relative depth of given XML element from specific element in the given XML file, I tried to use XPATH but I'm not very familiar with XML parsing and I'm not getting the desired result. I need as well to ignore the data elements while counting.
Below is the code that I have written and the sample XML file.
E.g. the depth of NM109_BillingProviderIdentifier from TS837_2000A_Loop element is 4.
The parent nodes are: TS837_2000A_Loop < NM1_SubLoop_2 < TS837_2010AA_Loop < NM1_BillingProviderName
as NM109_BillingProviderIdentifier is a child of NM1_BillingProviderName and thus the relative depth of NM1_BillingProviderName from TS837_2000A_Loop is 4 (including TS837_2000A_Loop).
package com.xmlexamples;
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
public class XmlParser {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("D://sample.xml")));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String expression;
expression = "count(NM109_BillingProviderIdentifier/preceding-sibling::TS837_2000A_Loop)+1";
Double d = (Double) xpath.compile(expression).evaluate(doc, XPathConstants.NUMBER);
System.out.println("position from TS837_2000A_Loop " + d);
}
}
<?xml version='1.0' encoding='UTF-8'?>
<X12_00501_837_P xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<TS837_2000A_Loop>
<NM1_SubLoop_2>
<TS837_2010AA_Loop>
<NM1_BillingProviderName>
<NM103_BillingProviderLastorOrganizationalName>VNA of Cape Cod</NM103_BillingProviderLastorOrganizationalName>
<NM109_BillingProviderIdentifier>1487651915</NM109_BillingProviderIdentifier>
</NM1_BillingProviderName>
<N3_BillingProviderAddress>
<N301_BillingProviderAddressLine>8669 NORTHWEST 36TH ST </N301_BillingProviderAddressLine>
</N3_BillingProviderAddress>
</TS837_2010AA_Loop>
</NM1_SubLoop_2>
</TS837_2000A_Loop>
</X12_00501_837_P>
The pivotal method for getting the depth of any node is by counting its ancestors (which include the parent, the parent of the parent etc):
count(NM109_BillingProviderIdentifier/ancestor-or-self::*)
This will give you the count up to the root. To get the relative count, i.e. from anything other than the root, assuming the names do not overlap, you can do this:
count(NM109_BillingProviderIdentifier/ancestor-or-self::*)
- count(NM109_BillingProviderIdentifier/ancestor::TS837_2000A_Loop/ancestor::*)
Depending on whether the current, or the base element should be included in the count, use the ancestor-or-self or ancestor axis.
PS: you should probably thank Pietro Saccardi for so kindly making your post and your huge (4kB on one line..) sample XML readable.
Please I need to get any occurrence of a div tag in a String value in Java.
The location of this div tag in the String is not known ahead.
Can anyone please help with any hint or link that an be helpful regarding this?
Thanks.
import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<div><![CDATA[<p>hellowrold</p>]]></div>";
String xpath = "/div/text()";
XPath xPath = XPathFactory.newInstance().newXPath();
String childHtml = xPath.evaluate(xpath, new InputSource(new StringReader(xml)));
System.out.println(childHtml);
xpath = "/p";
String result = xPath.evaluate(xpath, new InputSource(new StringReader(childHtml)));
System.out.println(result);
}
}
Xpath is used to traverse XML. You can do much more with XPath. In your case the problem is with CDATA tags that are hiding the elements which you want to extract. The above should help you get away with that. What has been done above ? first extract the CDATA tags as separate xml and then apply xpath on it.
I want to extract data from the html page with the help of jsoup and xpath.
This is my java code :-
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.w3c.dom.NodeList;
public class RssFeedRead {
public static void main(String args[])
{
try
{
Document doc = Jsoup.connect("http://timesofindia.indiatimes.com/world/china/China-sees-red-in-Abes-WWII-shrine-visit/articleshow/27989418.cms").get();
String title = doc.title();
System.out.println(title);
String exp = "//*[#id='cmtMainBox']/div/div[#class='cmtBox']/div/div[#class='box']/div[#class='cmt']/div/span";
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
XPathExpression expr = xPath.compile(exp);
NodeList node = (NodeList) expr.evaluate(doc, XPathConstants.NODE);
for (int i = 0; i < node.getLength(); i++)
{
System.out.println(expr.evaluate(node.item(i), XPathConstants.STRING));
}
}
catch(Exception e)
{
System.out.println(e);
}
}
}
This error occurred :-
java.lang.ClassCastException: org.jsoup.nodes.Document cannot be cast to org.w3c.dom.Node
so help me to solve this error
I am freshman here; after a simple investigation, I think you should mind two points:
1) You should cast Jsoup document to org.w3c.dom.Document. You can refer 17802445, to run the code you should download DOMBuilder.
2) I don't konw much about your page in CMS format, does the parser support this? I test the code in 17802445 with other links, it works.
But with your link I get a java.lang.NullPointerException, this says the cast failed.
please check it.
Hope you can solve it!
My first answer.
Please highlight the line where the exception was thrown and don't omit the stack trace.
This is the problematic line:
NodeList node = (NodeList) expr.evaluate(doc, XPathConstants.NODE);
You are mixing two APIs for document parsing and handling, XPath and JSoup. An XPath expression does not know about JSoup documents and can't handle them.
You need to decide which of both APIs you want to use for your specific job.
The error is clear enough: a jsoup Document cannot be casted to a w3c Node.
The line should be NodeList node = (NodeList) expr.evaluate(doc, XPathConstants.NODE);
You'll probably have to convert it to a jsoup Node (if it exists, I'm not familiar with this API).
They have everything you need in their javadoc
I have a XML org.w3c.dom.Document object.
It looks sorta like this:
<A>
<B>
<C/>
<D/>
<E/>
<F/>
</B>
<G>
<H/>
<H/>
<J/>
</G>
</A>
How can I convert the Document object so that it strips off the root node and returns another Document object subset (selected by name) that looks like this:
<G>
<H/>
<H/>
<J/>
</G>
I am hoping for something like this:
...
Document doc = db.parse(file);
Document subdoc = doc.getDocumentSubsetByName("G"); //imaginary method name
NodeList nodeList = subdoc.getElementsByTagName("H");
But I am having trouble finding such a thing.
The answer turns out to be something like this:
...
Document doc = db.parse();
doc.getDocumentElement.normalize();
NodeList a = doc.getElementsByTagName("A");
Element AsubNode = null;
if (a.item(0) != null) {
AsubNode = (Element) a.item(0);
nodeList = AsubNode.getElementsByTagName("G");
...
You can simply use getElementsByTagName("G") to get the G elements, then pick one of them and call getElementsByTagName("H") on that.
Of course, you could always use XPath to do the same thing:
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
final XPath xpath = XPathFactory.newInstance().newXPath();
final NodeList list = (NodeList) xpath.evaluate("/A/G/H",
doc.getDocumentElement(), XPathConstants.NODESET);
This begins to pay off when the path to your elements begins to become more complex (requiring attribute predicates, etc..)