Issue with xPath results in java

Issue with xPath results in java - java

I am having a problem in understanding the behavior of my below code, and until I understand it i am having a hard time trying to fix it. I have isolated the issue down to the simplest code snippet i can that displays the issue:
String sourceXML = "<root>\n"
+ "<Rule test=\"1\"/>\n"
+ "<Rule test=\"2\"/>\n"
+ "</root>";
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(sourceXML));
Document doc = db.parse(is);
NodeList ruleList = doc.getElementsByTagName("Rule");
System.out.println("Number of Items found : " + ruleList.getLength());
for (int t = 0; t < ruleList.getLength(); t++) {
if (ruleList.item(t).getNodeType() == Node.ELEMENT_NODE) {
Element ruleElement = (Element) ruleList.item(t);
String xPathToUse = "//Rule/#test";
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList ruleNodeList = (NodeList) xpath.evaluate(xPathToUse, ruleElement, XPathConstants.NODESET);
System.out.println("Found " + ruleNodeList.getLength() + " matches to xpath.....");
}
}
Generates the following output:
Number of Items found : 2
Found 2 matches to xpath.....
Found 2 matches to xpath.....
My expectation is that each xPath match would be only 1 for each iteration, as i am running the xpath on each element that i have extracted from the source XML. The output i would expect is:
Number of Items found : 2
Found 1 matches to xpath.....
Found 1 matches to xpath.....
However it appears as though when looping over the nodelist (which is correct - there are 2 in the source), that the xpath is being run on the whole source XML each time, even though i thought i extracted each node and am just running the xpath on that.
Can anybody help me with understanding what i am doing wrong here?

Related

Java DOM: getElementsByTagName is returning no elements?

I am trying to parse the XML document at http://web.mta.info/status/ServiceStatusSubway.xml and extract all the PtSituationElement elements with the following code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document subwayStatusDoc = builder.parse(new URL("http://web.mta.info/status/ServiceStatusSubway.xml").openStream());
NodeList situationList = subwayStatusDoc.getDocumentElement().getElementsByTagName("PtSituationElement");
System.out.println(situationList.item(0)); //prints null
What am I doing wrong here ?

The PtSituationElement tags contain child tags, so you need to go into those. Just printing .item(0) relies on the toString() method, and apparently it does not do a great job of explaining your nodes.
So add this to see some of the data in the child nodes:
Node item = situationList.item(0);
NodeList childNodes = item.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
System.out.println(childNodes.item(j).getTextContent());
}
(I'm not sure what you want to do with the data in the xml structure, but this example shows how you can proceed with your work.)
Also, I noted that the LongDescription tags contain HTML that is not correct XML (<br clear=left> should be <br clear=left> etc). The parser could have a problem with that. It would be better if the HTML was escaped (see How to escape "&" in XML?).

Getting null values from XPath query

I have this xml file:
<?xml version="1.0" encoding="UTF-8"?>
<iet:aw-data xmlns:iet="http://care.aw.com/IET/2007/12" class="com.aw.care.bean.resource.MessageResource">
<iet:metadata filter=""/>
<iet:message-resource>
<iet:message>some message 1</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.11</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
<iet:message-resource>
<iet:message>some message 2</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.12</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
.
.
.
.
</iet:aw-data>
Using this code below i'm getting over the data and finding what I need.
try {
FileInputStream fileIS = new FileInputStream(new File("resources\\bootstrap\\content\\MessageResources_iw_IL\\MessageResource_iw_IL.ctdata.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//*[local-name()='message-resource']//*[local-name()='code'][contains(text(), 'account')]";
NodeList nodeList = (NodeList) xPath.compile(query).evaluate(xmlDocument, XPathConstants.NODESET);
System.out.println("size= " + nodeList.getLength());
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getNodeValue());
}
}
catch (Exception e){
e.printStackTrace();
}
The issue is that i'm getting only null values while printing in the for loop, any idea why it's happened?
The code needs to return a list of nodes which have a code and message fields that contains a given parameters (same as like SQL query with two parameters with operator of AND between them)

Check the documentation:
https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html
getNodeValue() applied to an element node returns null.
Use getTextContent().
Alternatively, if you find DOM too frustrating, switch to one of the better tree models like JDOM2 or XOM.
Also, if you used an XPath 2.0 engine like Saxon, it would (a) simplify your expression to
//*:message-resource//*:code][contains(text(), 'account')]
and (b) allow you to return a sequence of strings from the XPath expression, rather than a sequence of nodes, so you wouldn't have to mess around with nodelists.
Another point: I suspect that the predicate [contains(text(), 'account')] should really be [.='account']. I'm not sure of that, but using text() instead of ".", and using contains() instead of "=", are both common mistakes.

java xpath list concatenation

I am using java XPathFactory to get values from a simple xml file:
<Obama>
<coolnessId>0</coolnessId>
<cars>0</cars>
<cars>1</cars>
<cars>2</cars>
</Obama>
With the xpression //Obama/coolnessId | //Obama/cars the result is:
0
0
1
2
From this result, I cannot distinguish between what is the coolnessId and what is the car id. I would need something like:
CoolnessId: 0
CarId: 0
CarId: 1
CarId: 2
With concat('c_id: ', //Obama/coolnessId,' car_id: ',//Obama/cars) I am close to the solution, but concat cannot be used for a list of values.
Unfortunately, I cannot use string-join, because it seems not be known in my xpath library. And I cannot manipulate the given xml.
What other tricks can I use to get a list of values with something like an alias?

If you select the elements rather than their text content you'll have some context:
public static void main(String[] args) throws Exception {
String xml =
"<Obama>" +
" <coolnessId>0</coolnessId>" +
" <cars>0</cars>" +
" <cars>1</cars>" +
" <cars>2</cars>" +
"</Obama>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = factory.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//Obama/cars | //Obama/coolnessId");
NodeList result = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < result.getLength(); i++) {
Element item = (Element) result.item(i);
System.out.println(item.getTagName() + ": " + item.getTextContent());
}
}

Assuming you ask for the result of the evaluation as a NODELIST, your XPath expression actually returns a sequence of four element nodes, not a sequence of four strings as you suggest. If your input uses the DOM tree model, these will be returned in the form of a DOM NodeList. You can process the Node objects in this NodeList to get the names of the nodes as well as their string values.
If you switch to an XPath 3.1 engine such as Saxon, you can get the result directly as a single string using the XPath expression:
string-join((//Obama/coolnessId | //Obama/cars) ! (name() || ': ' || string()), '\n')
To invoke XPath expressions in Saxon you can use either the JAXP API (javax.xml.xpath) or Saxon's s9api interface: I would recommend s9api because it understands the richer type system of XPath 2.0 and beyond.

java dom parser only gets the first entity

This code returs only one "question " tag's element but I have another 9 question element inside the xml file.What is the wrong thing in here?Do I need to loop.Because when I checked the loop, it loops only one time.What is the problem?I am figure out.
Here is my xml:
<Results>
<question>
<eno>3</eno>
<qno>1</qno>
<qtext>The Battle of Gettysburg was fought during which war?</qtext>
<correctAnswer>C</correctAnswer>
</question>
<question>
<eno>3</eno>
<qno>2</qno>
<qtext>Neil Armstrong and Buzz Aldrin walked how many
minutes on the moon in 1696?</qtext>
<correctAnswer>B</correctAnswer>
</question>
</Results>
my source code:
NodeList listOfQuestions = doc.getElementsByTagName("question");
for(int s=0; s<listOfQuestions.getLength(); s++)
{
System.out.println(listOfQuestions.getLength());
Node firstQuestionNode = listOfQuestions.item(0);
if(firstQuestionNode.getNodeType() == Node.ELEMENT_NODE){
Element firstQElement = (Element)firstQuestionNode;
NodeList enoList = firstQElement.getElementsByTagName("eno");
Element enoElement =(Element)enoList.item(s);
NodeList enosList = enoElement.getChildNodes();
String eno=((Node)enosList.item(s)).getNodeValue().trim();
System.out.println(eno);
NodeList qnoList = firstQElement.getElementsByTagName("qno");
Element qnoElement =(Element)qnoList.item(s);
NodeList qnosList = qnoElement.getChildNodes();
String qno= ((Node)qnosList.item(s)).getNodeValue().trim();
System.out.println(qno);
NodeList qtextList = firstQElement.getElementsByTagName("qtext");
Element qtextElement =(Element)qtextList.item(s);
NodeList qtextsList = qtextElement.getChildNodes();
String qtext= ((Node)qtextsList.item(s)).getNodeValue().trim();
System.out.println(qtext);
NodeList correctAnswerList = firstQElement.getElementsByTagName("correctAnswer");
Element correctAnswerElement =(Element)correctAnswerList.item(s);
NodeList correctAnswerElementList = correctAnswerElement.getChildNodes();
String correctAnswer= ((Node)correctAnswerElementList.item(s)).getNodeValue().trim();
System.out.println(correctAnswer);
int i=st.executeUpdate("insert into question(eno,qno,qtext,correctAnswer) values('"+eno+"','"+qno+"','"+qtext+"','"+correctAnswer+"')");
System.out.println("s is"+s);
}
}

You have hardcoded
Node firstQuestionNode = listOfQuestions.item(0);
^^^
I think you meant to use the variable s there... or maybe not, it's hard to tell what you're trying to do. Regardless, there are no other references to listOfQuestions and you never retrieve any node except the first one.

You should have a look at jsoup, it is an API specifically built for parsing HTML DOM code in java and has tons of extra features. Also, what you are currently trying to extract would not be more than just 3-4 LOC using the API components.
Look at their example on their website for connection to an URL and fetching DOM-Elements is just 2 LOC:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

PARSING A COMPLEX XML USING DOM

I know that this sort of question has been asked here before, but still i couldn't find any solution to my problem. So please, can anyone help me with it.....
Situation:
I am parsing the following xml response using DOM parser.
<feed>
<post_id>16</post_id>
<user_id>68</user_id>
<restaurant_id>5</restaurant_id>
<dish_id>7</dish_id>
<post_img_id>14</post_img_id>
<rezing_post_id></rezing_post_id>
<price>8.30</price>
<review>very bad</review>
<rating>4.0</rating>
<latitude>22.299999000000</latitude>
<longitude>73.199997000000</longitude>
<near> Gujarat</near>
<posted>1340869702</posted>
<display_name>username</display_name>
<username>vivek</username>
<first_name>vivek</first_name>
<last_name>mitra</last_name>
<dish_name>Hash brows</dish_name>
<restaurant_name>Waffle House</restaurant_name>
<post_img>https://img1.yumzing.com/1000/9cab8fc91</post_img>
<post_comment_count>0</post_comment_count>
<post_like_count>0</post_like_count>
<post_rezing_count>0</post_rezing_count>
<comments>
<comment/>
</comments>
</feed>
<feed>
<post_id>8</post_id>
<user_id>13</user_id>
<restaurant_id>5</restaurant_id>
<dish_id>6</dish_id>
<post_img_id>7</post_img_id>
<rezing_post_id></rezing_post_id>
<price>3.50</price>
<review>This is cheesy!</review>
<rating>4.0</rating>
<latitude>42.187000000000</latitude>
<longitude>-88.346497000000</longitude>
<near>Lake in the Hills IL</near>
<posted>1340333509</posted>
<display_name>username</display_name>
<username>Nullivex</username>
<first_name>Bryan</first_name>
<last_name>Tong</last_name>
<dish_name>Hash Brows with Cheese</dish_name>
<restaurant_name>Waffle House</restaurant_name>
<post_img>https://img1.yumzing.com/1000/78e5c184fd3ae752f8665636381a8f0006762dc0</post_img>
<post_comment_count>6</post_comment_count>
<post_like_count>1</post_like_count>
<post_rezing_count>1</post_rezing_count>
<comments>
<comment>
<user_id>16</user_id>
<email>existentialism27#gmail.com</email>
<email_new></email_new>
<email_verification_code></email_verification_code>
<password>9d99ef4f72f9d2df968a75e058c78245fa45e9e7</password>
<password_reset_code></password_reset_code>
<salt>31a988badccd35a1be7dacc073f60f52e49ff881</salt>
<username>existentialism27</username>
<first_name>Daniel</first_name>
<last_name>Amaya</last_name>
<display_name>username</display_name>
<birth_month>10</birth_month>
<birth_day>5</birth_day>
<birth_year>1985</birth_year>
<city>Colorado Springs</city>
<state>CO</state>
<country>US</country>
<timezone>US/Mountain</timezone>
<last_seen>1338365509</last_seen>
<is_confirmed>1</is_confirmed>
<is_active>1</is_active>
<post_comment_id>9</post_comment_id>
<post_id>8</post_id>
<comment>this is a test comment!</comment>
<posted>1340334121</posted>
</comment>
<comment>
<user_id>16</user_id>
<email>existentialism27#gmail.com</email>
<email_new></email_new>
<email_verification_code></email_verification_code>
<password>9d99ef4f72f9d2df968a75e058c78245fa45e9e7</password>
<password_reset_code></password_reset_code>
<salt>31a988badccd35a1be7dacc073f60f52e49ff881</salt>
<username>existentialism27</username>
<first_name>Daniel</first_name>
<last_name>Amaya</last_name>
<display_name>username</display_name>
<birth_month>10</birth_month>
<birth_day>5</birth_day>
<birth_year>1985</birth_year>
<city>Colorado Springs</city>
<state>CO</state>
<country>US</country>
<timezone>US/Mountain</timezone>
<last_seen>1338365509</last_seen>
<is_confirmed>1</is_confirmed>
<is_active>1</is_active>
<post_comment_id>10</post_comment_id>
<post_id>8</post_id>
<comment>this is a test comment!</comment>
<posted>1340334128</posted>
</comment>
</comments>
</feed>
In the above xml response, i am getting multiple "feed", which i am able to parse without any problem, but here each "feed" can have None or N numbers of "comment". I am not able to parse the comment for an individual feed. Can anyone suggest me how do proceed in the right direction.
I am also putting a snippet of code here, NOT the entire code.. that i am using to parse the xml doc, so it will be easier to pin point the problem.
DocumentBuilderFactory odbf = DocumentBuilderFactory.newInstance();
DocumentBuilder odb = odbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
Document odoc = odb.parse(is);
odoc.getDocumentElement().normalize ();
NodeList LOP = odoc.getElementsByTagName("feed");
System.out.println(LOP.getLength());
for (int s=0 ; s<LOP.getLength() ; s++){
Node FPN =LOP.item(s);
try{
if(FPN.getNodeType() == Node.ELEMENT_NODE)
{
Element token = (Element)FPN;
NodeList oNameList0 = token.getElementsByTagName("post_id");
Element ZeroNameElement = (Element)oNameList0.item(0);
NodeList textNList0 = ZeroNameElement.getChildNodes();
feed_post_id = Integer.parseInt(((Node)textNList0.item(0)).getNodeValue().trim());
System.out.println("#####The Parsed data#####");
System.out.println("post_id : " + ((Node)textNList0.item(0)).getNodeValue().trim());
System.out.println("#####The Parsed data#####");
}
}catch(Exception ex){}
}

Once you have the feed NodeList run on it and use:
NodeList nodes = feedNode.getChildNodes();
for (Node node: nodes)
{
if(node.getNodeName().equals("comments")){
//do something with comments node
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Issue with xPath results in java - java

Related

Java DOM: getElementsByTagName is returning no elements?

Getting null values from XPath query

java xpath list concatenation

java dom parser only gets the first entity

PARSING A COMPLEX XML USING DOM

Categories

Resources