java dom parser only gets the first entity

java dom parser only gets the first entity - java

This code returs only one "question " tag's element but I have another 9 question element inside the xml file.What is the wrong thing in here?Do I need to loop.Because when I checked the loop, it loops only one time.What is the problem?I am figure out.
Here is my xml:
<Results>
<question>
<eno>3</eno>
<qno>1</qno>
<qtext>The Battle of Gettysburg was fought during which war?</qtext>
<correctAnswer>C</correctAnswer>
</question>
<question>
<eno>3</eno>
<qno>2</qno>
<qtext>Neil Armstrong and Buzz Aldrin walked how many
minutes on the moon in 1696?</qtext>
<correctAnswer>B</correctAnswer>
</question>
</Results>
my source code:
NodeList listOfQuestions = doc.getElementsByTagName("question");
for(int s=0; s<listOfQuestions.getLength(); s++)
{
System.out.println(listOfQuestions.getLength());
Node firstQuestionNode = listOfQuestions.item(0);
if(firstQuestionNode.getNodeType() == Node.ELEMENT_NODE){
Element firstQElement = (Element)firstQuestionNode;
NodeList enoList = firstQElement.getElementsByTagName("eno");
Element enoElement =(Element)enoList.item(s);
NodeList enosList = enoElement.getChildNodes();
String eno=((Node)enosList.item(s)).getNodeValue().trim();
System.out.println(eno);
NodeList qnoList = firstQElement.getElementsByTagName("qno");
Element qnoElement =(Element)qnoList.item(s);
NodeList qnosList = qnoElement.getChildNodes();
String qno= ((Node)qnosList.item(s)).getNodeValue().trim();
System.out.println(qno);
NodeList qtextList = firstQElement.getElementsByTagName("qtext");
Element qtextElement =(Element)qtextList.item(s);
NodeList qtextsList = qtextElement.getChildNodes();
String qtext= ((Node)qtextsList.item(s)).getNodeValue().trim();
System.out.println(qtext);
NodeList correctAnswerList = firstQElement.getElementsByTagName("correctAnswer");
Element correctAnswerElement =(Element)correctAnswerList.item(s);
NodeList correctAnswerElementList = correctAnswerElement.getChildNodes();
String correctAnswer= ((Node)correctAnswerElementList.item(s)).getNodeValue().trim();
System.out.println(correctAnswer);
int i=st.executeUpdate("insert into question(eno,qno,qtext,correctAnswer) values('"+eno+"','"+qno+"','"+qtext+"','"+correctAnswer+"')");
System.out.println("s is"+s);
}
}

You have hardcoded
Node firstQuestionNode = listOfQuestions.item(0);
^^^
I think you meant to use the variable s there... or maybe not, it's hard to tell what you're trying to do. Regardless, there are no other references to listOfQuestions and you never retrieve any node except the first one.

You should have a look at jsoup, it is an API specifically built for parsing HTML DOM code in java and has tons of extra features. Also, what you are currently trying to extract would not be more than just 3-4 LOC using the API components.
Look at their example on their website for connection to an URL and fetching DOM-Elements is just 2 LOC:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Related

Java DOM: getElementsByTagName is returning no elements?

I am trying to parse the XML document at http://web.mta.info/status/ServiceStatusSubway.xml and extract all the PtSituationElement elements with the following code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document subwayStatusDoc = builder.parse(new URL("http://web.mta.info/status/ServiceStatusSubway.xml").openStream());
NodeList situationList = subwayStatusDoc.getDocumentElement().getElementsByTagName("PtSituationElement");
System.out.println(situationList.item(0)); //prints null
What am I doing wrong here ?

The PtSituationElement tags contain child tags, so you need to go into those. Just printing .item(0) relies on the toString() method, and apparently it does not do a great job of explaining your nodes.
So add this to see some of the data in the child nodes:
Node item = situationList.item(0);
NodeList childNodes = item.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
System.out.println(childNodes.item(j).getTextContent());
}
(I'm not sure what you want to do with the data in the xml structure, but this example shows how you can proceed with your work.)
Also, I noted that the LongDescription tags contain HTML that is not correct XML (<br clear=left> should be <br clear=left> etc). The parser could have a problem with that. It would be better if the HTML was escaped (see How to escape "&" in XML?).

Issue with xPath results in java

I am having a problem in understanding the behavior of my below code, and until I understand it i am having a hard time trying to fix it. I have isolated the issue down to the simplest code snippet i can that displays the issue:
String sourceXML = "<root>\n"
+ "<Rule test=\"1\"/>\n"
+ "<Rule test=\"2\"/>\n"
+ "</root>";
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(sourceXML));
Document doc = db.parse(is);
NodeList ruleList = doc.getElementsByTagName("Rule");
System.out.println("Number of Items found : " + ruleList.getLength());
for (int t = 0; t < ruleList.getLength(); t++) {
if (ruleList.item(t).getNodeType() == Node.ELEMENT_NODE) {
Element ruleElement = (Element) ruleList.item(t);
String xPathToUse = "//Rule/#test";
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList ruleNodeList = (NodeList) xpath.evaluate(xPathToUse, ruleElement, XPathConstants.NODESET);
System.out.println("Found " + ruleNodeList.getLength() + " matches to xpath.....");
}
}
Generates the following output:
Number of Items found : 2
Found 2 matches to xpath.....
Found 2 matches to xpath.....
My expectation is that each xPath match would be only 1 for each iteration, as i am running the xpath on each element that i have extracted from the source XML. The output i would expect is:
Number of Items found : 2
Found 1 matches to xpath.....
Found 1 matches to xpath.....
However it appears as though when looping over the nodelist (which is correct - there are 2 in the source), that the xpath is being run on the whole source XML each time, even though i thought i extracted each node and am just running the xpath on that.
Can anybody help me with understanding what i am doing wrong here?

How to find a DOM Node using its attributes

I have parsed my HTML/JSP into DOM at compile time using JAVA. Now I have the w3c.dom.Document object with me, let's say for the below HTML
.....
....
<input type="text" name="EnterName"/>
<select name="SelectOptions">
<option>First</option>
<option>Second</option>
</select>
......
.......
I know the attributes values of the elements. Here "EnterName" is the "name" attributes value of the node "input".
Suppose I have attributes values of all nodes available in DOM (like "EnterName", "SelectOptions" of above HTML), how do I can get a node in which a particular attribute is available with the given value. Thanks
EDIT :
I will never know whats the HTML contents. My program should run on
given list of HTML/JSP files and I have with me some element names.
Here the element name refers to the label/name of the fields available
in the HTML/JSP. So I need to traverse through all the files get the
node where it has the same label/name and get the node.

Try something like this:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse("yourDocumentName");
doc.getDocumentElement().normalize();
NodeList nlList = doc.getElementsByTagName("input");
for (int indx= 0; indx < nList.getLength(); indx++) {
Element eElement = (Element) nList.item(indx);
if(eElement.getAttribute("name").equals("EnterName")){
System.out.println("EnterName: " + eElement.getNodeValue());
}
}
NodeList nlList1 = doc.getElementsByTagName("select");
for (int indx= 0; indx < nList1.getLength(); indx++) {
Element eElement = (Element) nList1.item(indx);
if(eElement.getAttribute("name").equals("SelectOptions")){
System.out.println("SelectOptions: " + eElement.getNodeValue());
}
}
If you could add the "id" to your elements then its much easier:
<input type="text" name="EnterName" id="name"/>
<select name="SelectOptions" id="options">
...
Element nameElement = doc.getElementbyId("name");
System.out.println("EnterName: " + nameElement.getNodeValue());
Element selectElement = doc.getElementbyId("name");
System.out.println("SelectOptions: " + selectElement.getNodeValue());

You can add custom attributes in html for example to differentiate betweeen html components
<input type="text" name="EnterName" myattr1="yes"/>
<select name="SelectOptions" myattr2="yes">
<option>First</option>
<option>Second</option>
</select>
on basis of custom attributes you can check and differentiate on the HTML components...

You can say something like this:
Element input = .... documene.getElementByTagName("input");
Attribute eneterName = root.getAttributeNode("EnterName");
String s = enterName.getXXXValue();
Please refer the API to get the correct method to retreive the value.

Normally you search for attributes by their name, e.g. "name", not by their value, e.g. "EnterName". So you would typically go
String valueForName = myElement.getAttribute("name");
For anything very complicated, I use XPath. Which works for what you want. Here's a blog that looks like just what you want (though it's not Java, it's close enough):
http://blogs.msdn.com/b/davidklinems/archive/2007/03/13/quick-tip-using-xpath-to-find-nodes-by-attribute-value.aspx
Here's a similar non-Java Stack Overflow link
Elaborating in Java, its a bit tedious, but, roughly...
XPathFactory anXPathfactory = XPathFactory.newInstance();
XPath xpath = anXPathfactory.newXPath();
XPathExpression xpe = xpath.compile("your xpath goes here");
String finallyIGetSomething = (String) xpe.evaluate(node, XPathConstants.STRING);
Haven't tested this for your case so caveat emptor

PARSING A COMPLEX XML USING DOM

I know that this sort of question has been asked here before, but still i couldn't find any solution to my problem. So please, can anyone help me with it.....
Situation:
I am parsing the following xml response using DOM parser.
<feed>
<post_id>16</post_id>
<user_id>68</user_id>
<restaurant_id>5</restaurant_id>
<dish_id>7</dish_id>
<post_img_id>14</post_img_id>
<rezing_post_id></rezing_post_id>
<price>8.30</price>
<review>very bad</review>
<rating>4.0</rating>
<latitude>22.299999000000</latitude>
<longitude>73.199997000000</longitude>
<near> Gujarat</near>
<posted>1340869702</posted>
<display_name>username</display_name>
<username>vivek</username>
<first_name>vivek</first_name>
<last_name>mitra</last_name>
<dish_name>Hash brows</dish_name>
<restaurant_name>Waffle House</restaurant_name>
<post_img>https://img1.yumzing.com/1000/9cab8fc91</post_img>
<post_comment_count>0</post_comment_count>
<post_like_count>0</post_like_count>
<post_rezing_count>0</post_rezing_count>
<comments>
<comment/>
</comments>
</feed>
<feed>
<post_id>8</post_id>
<user_id>13</user_id>
<restaurant_id>5</restaurant_id>
<dish_id>6</dish_id>
<post_img_id>7</post_img_id>
<rezing_post_id></rezing_post_id>
<price>3.50</price>
<review>This is cheesy!</review>
<rating>4.0</rating>
<latitude>42.187000000000</latitude>
<longitude>-88.346497000000</longitude>
<near>Lake in the Hills IL</near>
<posted>1340333509</posted>
<display_name>username</display_name>
<username>Nullivex</username>
<first_name>Bryan</first_name>
<last_name>Tong</last_name>
<dish_name>Hash Brows with Cheese</dish_name>
<restaurant_name>Waffle House</restaurant_name>
<post_img>https://img1.yumzing.com/1000/78e5c184fd3ae752f8665636381a8f0006762dc0</post_img>
<post_comment_count>6</post_comment_count>
<post_like_count>1</post_like_count>
<post_rezing_count>1</post_rezing_count>
<comments>
<comment>
<user_id>16</user_id>
<email>existentialism27#gmail.com</email>
<email_new></email_new>
<email_verification_code></email_verification_code>
<password>9d99ef4f72f9d2df968a75e058c78245fa45e9e7</password>
<password_reset_code></password_reset_code>
<salt>31a988badccd35a1be7dacc073f60f52e49ff881</salt>
<username>existentialism27</username>
<first_name>Daniel</first_name>
<last_name>Amaya</last_name>
<display_name>username</display_name>
<birth_month>10</birth_month>
<birth_day>5</birth_day>
<birth_year>1985</birth_year>
<city>Colorado Springs</city>
<state>CO</state>
<country>US</country>
<timezone>US/Mountain</timezone>
<last_seen>1338365509</last_seen>
<is_confirmed>1</is_confirmed>
<is_active>1</is_active>
<post_comment_id>9</post_comment_id>
<post_id>8</post_id>
<comment>this is a test comment!</comment>
<posted>1340334121</posted>
</comment>
<comment>
<user_id>16</user_id>
<email>existentialism27#gmail.com</email>
<email_new></email_new>
<email_verification_code></email_verification_code>
<password>9d99ef4f72f9d2df968a75e058c78245fa45e9e7</password>
<password_reset_code></password_reset_code>
<salt>31a988badccd35a1be7dacc073f60f52e49ff881</salt>
<username>existentialism27</username>
<first_name>Daniel</first_name>
<last_name>Amaya</last_name>
<display_name>username</display_name>
<birth_month>10</birth_month>
<birth_day>5</birth_day>
<birth_year>1985</birth_year>
<city>Colorado Springs</city>
<state>CO</state>
<country>US</country>
<timezone>US/Mountain</timezone>
<last_seen>1338365509</last_seen>
<is_confirmed>1</is_confirmed>
<is_active>1</is_active>
<post_comment_id>10</post_comment_id>
<post_id>8</post_id>
<comment>this is a test comment!</comment>
<posted>1340334128</posted>
</comment>
</comments>
</feed>
In the above xml response, i am getting multiple "feed", which i am able to parse without any problem, but here each "feed" can have None or N numbers of "comment". I am not able to parse the comment for an individual feed. Can anyone suggest me how do proceed in the right direction.
I am also putting a snippet of code here, NOT the entire code.. that i am using to parse the xml doc, so it will be easier to pin point the problem.
DocumentBuilderFactory odbf = DocumentBuilderFactory.newInstance();
DocumentBuilder odb = odbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
Document odoc = odb.parse(is);
odoc.getDocumentElement().normalize ();
NodeList LOP = odoc.getElementsByTagName("feed");
System.out.println(LOP.getLength());
for (int s=0 ; s<LOP.getLength() ; s++){
Node FPN =LOP.item(s);
try{
if(FPN.getNodeType() == Node.ELEMENT_NODE)
{
Element token = (Element)FPN;
NodeList oNameList0 = token.getElementsByTagName("post_id");
Element ZeroNameElement = (Element)oNameList0.item(0);
NodeList textNList0 = ZeroNameElement.getChildNodes();
feed_post_id = Integer.parseInt(((Node)textNList0.item(0)).getNodeValue().trim());
System.out.println("#####The Parsed data#####");
System.out.println("post_id : " + ((Node)textNList0.item(0)).getNodeValue().trim());
System.out.println("#####The Parsed data#####");
}
}catch(Exception ex){}
}

Once you have the feed NodeList run on it and use:
NodeList nodes = feedNode.getChildNodes();
for (Node node: nodes)
{
if(node.getNodeName().equals("comments")){
//do something with comments node
}
}

How to parse this XML file

I am new to programming in java and i have just learned how to parse an xml file. But i am not getting any idea on how to parse this xml file. Please help me with a code on how to get the tags day1 and their inner tags order1,order2
<RoutePlan>
<day1>
<Order1>
<customer> XYZ</customer>
<address> INDIA </address>
<data> 10-10-2011 </data>
<time> 9.30 A.M </time>
</Order1>
<Order2>
<customer> ABC </customer>
<address> US </address>
<data> 10-10-2011 </data>
<time> 10.30 A.M </time>
</Order2>
</day1>
I wrote the following code to retrieve. But i am only getting the data in order1 but not in order2
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(file);
document.getDocumentElement().normalize();
System.out.println("Root Element: "+document.getDocumentElement().getNodeName());
NodeList node = document.getElementsByTagName("day1");
for(int i=0;i<node.getLength();i++){
Node firstNode = node.item(i);
Element element = (Element) firstNode;
NodeList customer = element.getElementsByTagName("customer");
Element customerElement = (Element) customer.item(0);
NodeList firstName = customerElement.getChildNodes();
System.out.println("Name: "+((firstName.item(0).getNodeValue())));
NodeList address = element.getElementsByTagName("address");
Element customerAddress = (Element) address.item(0);
NodeList addName = customerAddress.getChildNodes();
System.out.println("Address: "+((addName.item(0).getNodeValue())));
NodeList date = element.getElementsByTagName("date");
Element customerdate = (Element) date.item(0);
NodeList dateN = customerdate.getChildNodes();
System.out.println("Address: "+((dateN.item(0).getNodeValue())));
NodeList time = element.getElementsByTagName("time");
Element customertime = (Element) time.item(0);
NodeList Ntime = customertime.getChildNodes();
System.out.println("Time: "+((Ntime.item(0).getNodeValue())));
}

I can give you not one, not two, but three directions to parse this XML (there are more but let's say they are the most commons ones):
DOM -> two good resources to start : here and here
SAX -> quickstart from official website: here
StAX -> a good introduction: here
Judging by the size of your XML document, I'd probably go for a DOM parsing, which gonna be the easiest to implement and to use (but if you have to deal with larger files, take a look at SAX for reading-only manipulations and StAX for reading and writing ones).

The reason you are getting only "Order1" elements is because:
You lock on the "day1" node.
You retrieve the "customer" elements by tag name which returns 2 elements.
You retrieve the first element and print its value and hence the second "customer" is ignored.
When working with DOM, be prepared to spin up multiple loops for retrieving data. Also, you are a bit misguided when it comes to representing your schema. You really don't need to name "elements" as "day1"/"order1" etc. In XML, that can be simply expressed by having multiple "day" or "order" elements which in turn automatically enforces ordering. An example XML would look like:
<route-plan>
<day>
<order>
<something>
</order>
</day>
<day>
<order>
<something>
</order>
</day>
</route-plan>
Now retrieving "day" elements is a simple matter of:
Look up "day" elements by tag name
For each "day" element
Look up "order" element by tag name
For each "order" element
Print out the value of "customer"/"address" etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java dom parser only gets the first entity - java

You have hardcoded Node firstQuestionNode = listOfQuestions.item(0); ^^^ I think you meant to use the variable s there... or maybe not, it's hard to tell what you're trying to do. Regardless, there are no other references to listOfQuestions and you never retrieve any node except the first one.

Related

Java DOM: getElementsByTagName is returning no elements?

Issue with xPath results in java

How to find a DOM Node using its attributes

PARSING A COMPLEX XML USING DOM

How to parse this XML file

Categories

Resources