How to parse this XML file - java

I am new to programming in java and i have just learned how to parse an xml file. But i am not getting any idea on how to parse this xml file. Please help me with a code on how to get the tags day1 and their inner tags order1,order2
<RoutePlan>
<day1>
<Order1>
<customer> XYZ</customer>
<address> INDIA </address>
<data> 10-10-2011 </data>
<time> 9.30 A.M </time>
</Order1>
<Order2>
<customer> ABC </customer>
<address> US </address>
<data> 10-10-2011 </data>
<time> 10.30 A.M </time>
</Order2>
</day1>
I wrote the following code to retrieve. But i am only getting the data in order1 but not in order2
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(file);
document.getDocumentElement().normalize();
System.out.println("Root Element: "+document.getDocumentElement().getNodeName());
NodeList node = document.getElementsByTagName("day1");
for(int i=0;i<node.getLength();i++){
Node firstNode = node.item(i);
Element element = (Element) firstNode;
NodeList customer = element.getElementsByTagName("customer");
Element customerElement = (Element) customer.item(0);
NodeList firstName = customerElement.getChildNodes();
System.out.println("Name: "+((firstName.item(0).getNodeValue())));
NodeList address = element.getElementsByTagName("address");
Element customerAddress = (Element) address.item(0);
NodeList addName = customerAddress.getChildNodes();
System.out.println("Address: "+((addName.item(0).getNodeValue())));
NodeList date = element.getElementsByTagName("date");
Element customerdate = (Element) date.item(0);
NodeList dateN = customerdate.getChildNodes();
System.out.println("Address: "+((dateN.item(0).getNodeValue())));
NodeList time = element.getElementsByTagName("time");
Element customertime = (Element) time.item(0);
NodeList Ntime = customertime.getChildNodes();
System.out.println("Time: "+((Ntime.item(0).getNodeValue())));
}

I can give you not one, not two, but three directions to parse this XML (there are more but let's say they are the most commons ones):
DOM -> two good resources to start : here and here
SAX -> quickstart from official website: here
StAX -> a good introduction: here
Judging by the size of your XML document, I'd probably go for a DOM parsing, which gonna be the easiest to implement and to use (but if you have to deal with larger files, take a look at SAX for reading-only manipulations and StAX for reading and writing ones).

The reason you are getting only "Order1" elements is because:
You lock on the "day1" node.
You retrieve the "customer" elements by tag name which returns 2 elements.
You retrieve the first element and print its value and hence the second "customer" is ignored.
When working with DOM, be prepared to spin up multiple loops for retrieving data. Also, you are a bit misguided when it comes to representing your schema. You really don't need to name "elements" as "day1"/"order1" etc. In XML, that can be simply expressed by having multiple "day" or "order" elements which in turn automatically enforces ordering. An example XML would look like:
<route-plan>
<day>
<order>
<something>
</order>
</day>
<day>
<order>
<something>
</order>
</day>
</route-plan>
Now retrieving "day" elements is a simple matter of:
Look up "day" elements by tag name
For each "day" element
Look up "order" element by tag name
For each "order" element
Print out the value of "customer"/"address" etc.

Related

Access data in xml as string

I am receiving a xml in string format. Is there any library to search for elements in the string?
<Version value="0"/>
<IssueDate>2017-12-15</IssueDate>
<Locale>en_US</Locale>
<RecipientAddress>
<Category>Primary</Category>
<SubCategory>0</SubCategory>
<Name>Vitsi</Name>
<Attention>Stowell Group Llc.</Attention>
<AddressLine1>511 6th St</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-2903</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>2903</ZIP4>
</RecipientAddress>
<RecipientAddress>
<Category>Additional</Category>
<SubCategory>1</SubCategory>
<Name>Vitsi</Name>
<AddressLine1>Po Box 957</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-0104</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>0104</ZIP4>
</RecipientAddress>
<SenderName>TMO</SenderName>
<SenderId>IL</SenderId>
<SenderAddress>
<Name>T-mobile</Name>
<AddressLine1>Po Box 790047</AddressLine1>
<City>St. Louis</City>
<PresentationValue>ST. LOUIS MO 63179-0047</PresentationValue>
<State>MO</State>
<ZIPCode>63179</ZIPCode>
.
.
.
.
I want to access the element RecipientAddress, which is a list. Is there any library to do that? Please note that what I receive is a string. It is an invoice and there will be many to process, so performance is important
Following options are available:
Convert xml string to java objects using JAXB.
Use .indexOf() in string method to retrieve specific parts of xml.
Use regular expression to retrieve specific parts of xml.
SAX/DOM/STAX parser for parsing and extraction from xml.
Xpath for fetching the specific values from xml.
You could use XPATH. Java has inbuilt support for XML querying without any thirdparty library,
Code piece would be,
String xmlInputStr = "<YOUR_XML_STRING_INPUT>"
String xpathExpressionStr = "<XPATH_EXPRESSION_STRING>"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlInputStr);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(xpathExpressionStr);
You can write your own expression string for querying. Typical example
"/RecipientAddress/Category"
Evaluate your xml against expression to retrieve list of nodes.
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
And iterate over nodes,
for (int i = 0; i < nodes.getLength(); i++) {
Node nNode = nodes.item(i);
...
}
There lot of pre-implemented api is available to convert xml to java object.
please look at that the xerces from Apache.
If you want extract only specified value the put whole in to string and use indexOf("string")

java dom parser only gets the first entity

This code returs only one "question " tag's element but I have another 9 question element inside the xml file.What is the wrong thing in here?Do I need to loop.Because when I checked the loop, it loops only one time.What is the problem?I am figure out.
Here is my xml:
<Results>
<question>
<eno>3</eno>
<qno>1</qno>
<qtext>The Battle of Gettysburg was fought during which war?</qtext>
<correctAnswer>C</correctAnswer>
</question>
<question>
<eno>3</eno>
<qno>2</qno>
<qtext>Neil Armstrong and Buzz Aldrin walked how many
minutes on the moon in 1696?</qtext>
<correctAnswer>B</correctAnswer>
</question>
</Results>
my source code:
NodeList listOfQuestions = doc.getElementsByTagName("question");
for(int s=0; s<listOfQuestions.getLength(); s++)
{
System.out.println(listOfQuestions.getLength());
Node firstQuestionNode = listOfQuestions.item(0);
if(firstQuestionNode.getNodeType() == Node.ELEMENT_NODE){
Element firstQElement = (Element)firstQuestionNode;
NodeList enoList = firstQElement.getElementsByTagName("eno");
Element enoElement =(Element)enoList.item(s);
NodeList enosList = enoElement.getChildNodes();
String eno=((Node)enosList.item(s)).getNodeValue().trim();
System.out.println(eno);
NodeList qnoList = firstQElement.getElementsByTagName("qno");
Element qnoElement =(Element)qnoList.item(s);
NodeList qnosList = qnoElement.getChildNodes();
String qno= ((Node)qnosList.item(s)).getNodeValue().trim();
System.out.println(qno);
NodeList qtextList = firstQElement.getElementsByTagName("qtext");
Element qtextElement =(Element)qtextList.item(s);
NodeList qtextsList = qtextElement.getChildNodes();
String qtext= ((Node)qtextsList.item(s)).getNodeValue().trim();
System.out.println(qtext);
NodeList correctAnswerList = firstQElement.getElementsByTagName("correctAnswer");
Element correctAnswerElement =(Element)correctAnswerList.item(s);
NodeList correctAnswerElementList = correctAnswerElement.getChildNodes();
String correctAnswer= ((Node)correctAnswerElementList.item(s)).getNodeValue().trim();
System.out.println(correctAnswer);
int i=st.executeUpdate("insert into question(eno,qno,qtext,correctAnswer) values('"+eno+"','"+qno+"','"+qtext+"','"+correctAnswer+"')");
System.out.println("s is"+s);
}
}
You have hardcoded
Node firstQuestionNode = listOfQuestions.item(0);
^^^
I think you meant to use the variable s there... or maybe not, it's hard to tell what you're trying to do. Regardless, there are no other references to listOfQuestions and you never retrieve any node except the first one.
You should have a look at jsoup, it is an API specifically built for parsing HTML DOM code in java and has tons of extra features. Also, what you are currently trying to extract would not be more than just 3-4 LOC using the API components.
Look at their example on their website for connection to an URL and fetching DOM-Elements is just 2 LOC:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

Extracting the node values in XML with XPath in Java

I have an XML document:
<response>
<result>
<phone>1233</phone>
<sys_id>asweyu4</sys_id>
<link>rft45fgd</link>
<!-- Many more in result -->
</result>
<!-- Many more result nodes -->
</response>
The XML structure is unknown. I am getting XPath for attributes from user.
e.g. inputs are strings like:
//response/result/sys_id , //response/result/phone
How can I get these node values for whole XML document by evaluating XPath?
I referred this but my xpath is as shown above i.e it does not have * or text() format.
The xpath evaluator works perfectly fine with my input format, so is there any way I can achieve the same in java?
Thank you!
It's difficult without seeing your code... I'd just evaluate as a NodeList and then call getTextContent() on each node in the result list...
String input = "<response><result><phone>1233</phone><sys_id>asweyu4</sys_id><link>rft45fgd</link></result><result><phone>1233</phone><sys_id>another-sysid</sys_id><link>another-link</link></result></response>";
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new ByteArrayInputStream(input.getBytes("UTF-8")));
XPath path = XPathFactory.newInstance().newXPath();
NodeList node = (NodeList) path.compile("//response/result/sys_id").evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < node.getLength(); i++) {
System.out.println(node.item(i).getTextContent());
}
Output
asweyu4
another-sysid

How to find a DOM Node using its attributes

I have parsed my HTML/JSP into DOM at compile time using JAVA. Now I have the w3c.dom.Document object with me, let's say for the below HTML
.....
....
<input type="text" name="EnterName"/>
<select name="SelectOptions">
<option>First</option>
<option>Second</option>
</select>
......
.......
I know the attributes values of the elements. Here "EnterName" is the "name" attributes value of the node "input".
Suppose I have attributes values of all nodes available in DOM (like "EnterName", "SelectOptions" of above HTML), how do I can get a node in which a particular attribute is available with the given value. Thanks
EDIT :
I will never know whats the HTML contents. My program should run on
given list of HTML/JSP files and I have with me some element names.
Here the element name refers to the label/name of the fields available
in the HTML/JSP. So I need to traverse through all the files get the
node where it has the same label/name and get the node.
Try something like this:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse("yourDocumentName");
doc.getDocumentElement().normalize();
NodeList nlList = doc.getElementsByTagName("input");
for (int indx= 0; indx < nList.getLength(); indx++) {
Element eElement = (Element) nList.item(indx);
if(eElement.getAttribute("name").equals("EnterName")){
System.out.println("EnterName: " + eElement.getNodeValue());
}
}
NodeList nlList1 = doc.getElementsByTagName("select");
for (int indx= 0; indx < nList1.getLength(); indx++) {
Element eElement = (Element) nList1.item(indx);
if(eElement.getAttribute("name").equals("SelectOptions")){
System.out.println("SelectOptions: " + eElement.getNodeValue());
}
}
If you could add the "id" to your elements then its much easier:
<input type="text" name="EnterName" id="name"/>
<select name="SelectOptions" id="options">
...
Element nameElement = doc.getElementbyId("name");
System.out.println("EnterName: " + nameElement.getNodeValue());
Element selectElement = doc.getElementbyId("name");
System.out.println("SelectOptions: " + selectElement.getNodeValue());
You can add custom attributes in html for example to differentiate betweeen html components
<input type="text" name="EnterName" myattr1="yes"/>
<select name="SelectOptions" myattr2="yes">
<option>First</option>
<option>Second</option>
</select>
on basis of custom attributes you can check and differentiate on the HTML components...
You can say something like this:
Element input = .... documene.getElementByTagName("input");
Attribute eneterName = root.getAttributeNode("EnterName");
String s = enterName.getXXXValue();
Please refer the API to get the correct method to retreive the value.
Normally you search for attributes by their name, e.g. "name", not by their value, e.g. "EnterName". So you would typically go
String valueForName = myElement.getAttribute("name");
For anything very complicated, I use XPath. Which works for what you want. Here's a blog that looks like just what you want (though it's not Java, it's close enough):
http://blogs.msdn.com/b/davidklinems/archive/2007/03/13/quick-tip-using-xpath-to-find-nodes-by-attribute-value.aspx
Here's a similar non-Java Stack Overflow link
Elaborating in Java, its a bit tedious, but, roughly...
XPathFactory anXPathfactory = XPathFactory.newInstance();
XPath xpath = anXPathfactory.newXPath();
XPathExpression xpe = xpath.compile("your xpath goes here");
String finallyIGetSomething = (String) xpe.evaluate(node, XPathConstants.STRING);
Haven't tested this for your case so caveat emptor

PARSING A COMPLEX XML USING DOM

I know that this sort of question has been asked here before, but still i couldn't find any solution to my problem. So please, can anyone help me with it.....
Situation:
I am parsing the following xml response using DOM parser.
<feed>
<post_id>16</post_id>
<user_id>68</user_id>
<restaurant_id>5</restaurant_id>
<dish_id>7</dish_id>
<post_img_id>14</post_img_id>
<rezing_post_id></rezing_post_id>
<price>8.30</price>
<review>very bad</review>
<rating>4.0</rating>
<latitude>22.299999000000</latitude>
<longitude>73.199997000000</longitude>
<near> Gujarat</near>
<posted>1340869702</posted>
<display_name>username</display_name>
<username>vivek</username>
<first_name>vivek</first_name>
<last_name>mitra</last_name>
<dish_name>Hash brows</dish_name>
<restaurant_name>Waffle House</restaurant_name>
<post_img>https://img1.yumzing.com/1000/9cab8fc91</post_img>
<post_comment_count>0</post_comment_count>
<post_like_count>0</post_like_count>
<post_rezing_count>0</post_rezing_count>
<comments>
<comment/>
</comments>
</feed>
<feed>
<post_id>8</post_id>
<user_id>13</user_id>
<restaurant_id>5</restaurant_id>
<dish_id>6</dish_id>
<post_img_id>7</post_img_id>
<rezing_post_id></rezing_post_id>
<price>3.50</price>
<review>This is cheesy!</review>
<rating>4.0</rating>
<latitude>42.187000000000</latitude>
<longitude>-88.346497000000</longitude>
<near>Lake in the Hills IL</near>
<posted>1340333509</posted>
<display_name>username</display_name>
<username>Nullivex</username>
<first_name>Bryan</first_name>
<last_name>Tong</last_name>
<dish_name>Hash Brows with Cheese</dish_name>
<restaurant_name>Waffle House</restaurant_name>
<post_img>https://img1.yumzing.com/1000/78e5c184fd3ae752f8665636381a8f0006762dc0</post_img>
<post_comment_count>6</post_comment_count>
<post_like_count>1</post_like_count>
<post_rezing_count>1</post_rezing_count>
<comments>
<comment>
<user_id>16</user_id>
<email>existentialism27#gmail.com</email>
<email_new></email_new>
<email_verification_code></email_verification_code>
<password>9d99ef4f72f9d2df968a75e058c78245fa45e9e7</password>
<password_reset_code></password_reset_code>
<salt>31a988badccd35a1be7dacc073f60f52e49ff881</salt>
<username>existentialism27</username>
<first_name>Daniel</first_name>
<last_name>Amaya</last_name>
<display_name>username</display_name>
<birth_month>10</birth_month>
<birth_day>5</birth_day>
<birth_year>1985</birth_year>
<city>Colorado Springs</city>
<state>CO</state>
<country>US</country>
<timezone>US/Mountain</timezone>
<last_seen>1338365509</last_seen>
<is_confirmed>1</is_confirmed>
<is_active>1</is_active>
<post_comment_id>9</post_comment_id>
<post_id>8</post_id>
<comment>this is a test comment!</comment>
<posted>1340334121</posted>
</comment>
<comment>
<user_id>16</user_id>
<email>existentialism27#gmail.com</email>
<email_new></email_new>
<email_verification_code></email_verification_code>
<password>9d99ef4f72f9d2df968a75e058c78245fa45e9e7</password>
<password_reset_code></password_reset_code>
<salt>31a988badccd35a1be7dacc073f60f52e49ff881</salt>
<username>existentialism27</username>
<first_name>Daniel</first_name>
<last_name>Amaya</last_name>
<display_name>username</display_name>
<birth_month>10</birth_month>
<birth_day>5</birth_day>
<birth_year>1985</birth_year>
<city>Colorado Springs</city>
<state>CO</state>
<country>US</country>
<timezone>US/Mountain</timezone>
<last_seen>1338365509</last_seen>
<is_confirmed>1</is_confirmed>
<is_active>1</is_active>
<post_comment_id>10</post_comment_id>
<post_id>8</post_id>
<comment>this is a test comment!</comment>
<posted>1340334128</posted>
</comment>
</comments>
</feed>
In the above xml response, i am getting multiple "feed", which i am able to parse without any problem, but here each "feed" can have None or N numbers of "comment". I am not able to parse the comment for an individual feed. Can anyone suggest me how do proceed in the right direction.
I am also putting a snippet of code here, NOT the entire code.. that i am using to parse the xml doc, so it will be easier to pin point the problem.
DocumentBuilderFactory odbf = DocumentBuilderFactory.newInstance();
DocumentBuilder odb = odbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
Document odoc = odb.parse(is);
odoc.getDocumentElement().normalize ();
NodeList LOP = odoc.getElementsByTagName("feed");
System.out.println(LOP.getLength());
for (int s=0 ; s<LOP.getLength() ; s++){
Node FPN =LOP.item(s);
try{
if(FPN.getNodeType() == Node.ELEMENT_NODE)
{
Element token = (Element)FPN;
NodeList oNameList0 = token.getElementsByTagName("post_id");
Element ZeroNameElement = (Element)oNameList0.item(0);
NodeList textNList0 = ZeroNameElement.getChildNodes();
feed_post_id = Integer.parseInt(((Node)textNList0.item(0)).getNodeValue().trim());
System.out.println("#####The Parsed data#####");
System.out.println("post_id : " + ((Node)textNList0.item(0)).getNodeValue().trim());
System.out.println("#####The Parsed data#####");
}
}catch(Exception ex){}
}
Once you have the feed NodeList run on it and use:
NodeList nodes = feedNode.getChildNodes();
for (Node node: nodes)
{
if(node.getNodeName().equals("comments")){
//do something with comments node
}
}

Categories

Resources