How to traverse this XML to get DATA? - java

I am trying to getting information of the item in the xml that is presented like this:
<item>
<title>The Colbert Report - Confused by Rick Parry With an "A" for America</title>
<guid isPermaLink="false">http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</guid>
<link>http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert-report-confused-by-rick-parry-with-an-a-for-america</link>
<description><a href="http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0"><img src="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" align="right" hspace="10" vspace="10" width="145" height="80" border="0" /></a><p>The fat cat media elites in Des Moines think they can sit in their ivory corn silos and play puppet master with national politics.</p><p><a href="http://www.hulu.com/users/add_to_playlist?from=feed&video_id=267788">Add this to your queue</a><br/>Added: Fri Aug 12 09:59:14 UTC 2011<br/>Air date: Thu Aug 11 00:00:00 UTC 2011<br/>Duration: 05:39<br/>Rating: 4.7 / 5.0<br/></p><img src="http://feeds.feedburner.com/~r/HuluPopularVideosThisWeek/~4/6aeJ5cWMBzw" height="1" width="1"/></description>
<pubDate>Fri, 12 Aug 2011 09:59:14 -0000</pubDate>
<media:thumbnail height="80" width="145" url="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" />
<media:credit>Comedy Central</media:credit>
<dcterms:valid>start=2011-08-12T00:15:00Z; end=2011-09-09T23:45:00Z; scheme=W3C-DTF</dcterms:valid>
<feedburner:origLink>http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</feedburner:origLink></item>
<item>
I need the title, link, media:thumbnail url and description.
I have used the method found in: http://www.rgagnon.com/javadetails/java-0573.html
Things work fine for title and link, but not on the image url and description.
Can someone help me with this?

You can use XPath to retrieve particular data from an XML document.
For example in order to retrieve the content of the url attribute:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String url = xpath.evaluate("/item/media:thumbnail/#url", new InputSource("data.xml"));

try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new FileReader(new File("item.xml")));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("item");
// iterate the employees
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName("title");
Element line = (Element) title.item(0);
System.out.println("title: " + line.getTextContent());
NodeList link = element.getElementsByTagName("link");
line = (Element) link.item(0);
System.out.println("link: " + line.getTextContent());
NodeList mt = element.getElementsByTagName("media:thumbnail");
line = (Element) mt.item(0);
System.out.println("media:thumbnail: " + line.getTextContent());
Attr url = line.getAttributeNode("url");
System.out.println("media:thumbnail -> url: " + url.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
For url, you first get element media:thumbnail, and then since url is an attribute of media:thumbnail, you simply call the function getAttributeNode("url") from the media:thumbnail element.

For pure DOM solution you could use following code to fetch wanted values:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("document.xml");
Element item = doc.getDocumentElement(); // assuming that item is a root element
NodeList itemChilds = item.getChildNodes();
for (int i = 0; i != itemChilds.getLength(); ++i)
{
Node itemChildNode = itemChilds.item(i);
if (!(itemChildNode instanceof Element))
continue;
Element itemChild = (Element) itemChildNode;
String itemChildName = itemChild.getNodeName();
if (itemChildName.equals("title")) // possible switch in Java 7
System.out.println("title: " + itemChild.getTextContent());
else if (itemChildName.equals("link"))
System.out.println("link: " + itemChild.getTextContent());
else if (itemChildName.equals("description"))
System.out.println("description: " + itemChild.getTextContent());
else if (itemChildName.equals("media:thumbnail"))
System.out.println("image url: " + itemChild.getAttribute("url"));
}
Result:
title: The Colbert Report - Confused by Rick Parry With an "A" for America
link: http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert..
description: <a href="http://www.hulu.com/watch/267788/the-colbert-report-confuse..
image url: http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg

The problem here is that the description tag contains an escaped xml (or perhaps html) string rather than just xml.
Probably the easiest thing to do is to get the text contained by this tag and open another XML parser to parse it as a separate XML document. This may not work if it's actually an html fragment and not valid xml however.

Related

Getting a node from an XML document

I use the worldweatheronline API. The service gives xml in the following form:
<hourly>
<tempC>-3</tempC>
<weatherDesc>rain</weatherDesc>
<precipMM>0.0</precipMM>
</hourly>
<hourly>
<tempC>5</tempC>
<weatherDesc>no</weatherDesc>
<precipMM>0.1</precipMM>
</hourly>
Can I somehow get all the nodes <hourly> in which <tempC>> 0 and <weatherDesc> = rain?
How to exclude from the response the nodes that are not interesting to me <hourly>?
This is quite feasible using XPath.
You can filter a document based on element values, attribute values and other criteria.
Here is a working example that gets the elements according to the first point in the question:
try (InputStream is = Files.newInputStream(Paths.get("C:/temp/test.xml"))) {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document xmlDocument = builder.parse(is);
XPath xPath = XPathFactory.newInstance().newXPath();
// get hourly elements that have tempC child element with value > 0 and weatherDesc child element with value = "rain"
String expression = "//hourly[tempC>0 and weatherDesc=\"rain\"]";
NodeList hours = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < hours.getLength(); i++) {
System.out.println(hours.item(i) + " " + hours.item(i).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
I think you should create xsd from xml and generate JAXB classes.Using those JAXB class you can easily unmarshal the xml and process your logic.

XML Parsing: Cannot Find Nodes by Tagname

I have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<subscriber>
<data name="quota">
<![CDATA[
<?xml version="1.0" encoding="UTF-8"?><usage><version>1</version><field name="Cid"/><field name="Time"/><field name="totalVolume">4</field><field name="inputVolume"/><field name="outputVolume"/><field name="serviceSpecific"/><field name="nextResetTime"/><field name="Type"/><field name="GrantedTotalVolume"/><field name="GrantedInputVolume"/><field name="GrantedOutputVolume"/><field name="GrantedTime"/><field name="GrantedServiceSpecific"/><field name="QuotaState"/><field name="RefInstanceId"/><field name="Name">TEST_QUOTA</field></usage>
]]>
</data>
</subscriber>
In order to find all the field nodes, I wrote:
dbuilder = dbc.newDocumentBuilder();
Document doc = dbuilder.parse(new InputSource(new StringReader(xmlString)));
NodeList nl = doc.getElementsByTagName("field");
log.debug("node list length: " + nl.getLength());
for(int i = 0 ; i < nl.getLength(); i++){
Element e = (Element)nl.item(i);
log.debug("node: " + e);
String name = e.getAttribute("name");
}
However, the length of the NodeList is 0, so it cannot find any node with name field. I wonder if it's because of the meta data outside the field nodes, and if so, how can I access the field nodes?
First you need to extract the data element from the initial document
Document doc = dbuilder.parse(new InputSource(new StringReader(xmlString)));
Element subscriber = (Element) doc.getElementsByTagName("subscriber").item(0);
Element data = (Element) subscriber.getElementsByTagName("data").item(0);
After that you need to use its TextContent to parse the document you actually want.
Document doc2 = dbuilder.parse(new InputSource(new StringReader(data.getTextContent().trim())));
Element usage = (Element) doc2.getElementsByTagName("usage").item(0);
NodeList nl = usage.getElementsByTagName("field");

Android/Java XML Parsing with nodes of same name

I need some advice on how to parse XML with Java where there are multiple nodes that have the same tag. For example, if I have an XML file that looks like this:
<?xml version="1.0"?>
<TrackResponse>
<TrackInfo ID="EJ958083578US">
<TrackSummary>Your item was delivered at 8:10 am on June 1 in Wilmington DE 19801.</TrackSummary>
<TrackDetail>May 30 11:07 am NOTICE LEFT WILMINGTON DE 19801.</TrackDetail>
<TrackDetail>May 30 10:08 am ARRIVAL AT UNIT WILMINGTON DE 19850.</TrackDetail>
<TrackDetail>May 29 9:55 am ACCEPT OR PICKUP EDGEWATER NJ 07020.</TrackDetail>
</TrackInfo>
</TrackResponse>
I am able to get the "TrackSummary" but I do not know how to handle the "TrackDetail", since there is more than 1. There could be more than the 3 on that sample XML so I need a way to handle that.
So far I have this code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlResponse));
Document dom = builder.parse(is);
//Get the ROOT: "TrackResponse"
Element docEle = dom.getDocumentElement();
//Get the CHILD: "TrackInfo"
NodeList nl = docEle.getElementsByTagName("TrackInfo");
String summary = "";
//Make sure we found the child node okay
if (nl != null && nl.getLength() > 0)
{
//In the event that there is more then one node, loop
for (int i = 0 ; i < nl.getLength(); i++)
{
summary = getTextValue(docEle,"TrackSummary");
Log.d("SUMMARY", summary);
}
return summary;
}
How would I handle the whole 'multiple TrackDetail nodes' ordeal? I'm new to XML parsing so I am a bit unfamiliar on how to tackle things like this.
You can try like this :
public Map getValue(Element element, String str) {
NodeList n = element.getElementsByTagName(str);
for (int i = 0; i < n.getLength(); i++) {
System.out.println(getElementValue(n.item(i)));
}
return list/MapHere;
}
If you are free to change your implementation then i would suggest you to use implementation given here.
you can collect the trackdetail in string array and when you are in XmlPullParser.END_TAG check for trackinfo tag end and then stop
You can refer below code for that.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(f);
Element root = doc.getDocumentElement();
NodeList nodeList = doc.getElementsByTagName("TrackInfo");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i); // this is node under track info
// do your stuff
}
for more information you can go through below link.
How to parse same name tag in xml using dom parser java?
It may help.

How to get the text from the group of XML nodes using Java?

Following is the XML file -
<Country>
<Group>
<C>Tokyo</C>
<C>Beijing</C>
<C>Bangkok</C>
</Group>
<Group>
<C>New Delhi</C>
<C>Mumbai</C>
</Group>
<Group>
<C>Colombo</C>
</Group>
</Country>
I want to save the name of Cities to a text file using Java & XPath -
Below is the Java code which is unable to do the needful.
.....
.....
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("Continent.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// XPath Query for showing all nodes value
XPathExpression expr = xpath.compile("//Country/Group");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
BufferedWriter out = new BufferedWriter(new FileWriter("Cities.txt"));
Node node;
for (int i = 0; i < nodes.getLength(); i++)
{
node = nodes.item(i);
String city = xpath.evaluate("C",node);
out.write(" " + city + "\r\n");
}
out.close();
.....
.....
Can somebody help me to get the required output?
You are getting only the first city because that's what you asked for. Your first XPATH expression returns all the Group nodes. You iterate over these and evaluate the XPATH C relative to each Group, returning a single city.
Just change the first XPATH to //Country/Group/C and eliminate the second XPATH altogether -- just print the text value of each node returned by the first XPATH.
I.e.:
XPathExpression expr = xpath.compile("//Country/Group/C");
...
for (int i = 0; i < nodes.getLength(); i++)
{
node = nodes.item(i);
out.write(" " + node.getTextContent() + "\n");
}

How to getElementById using DOM?

I am having part of HTML page given below and want to extract the content of div tag its id is hiddenDivHL using DOM Parser:
Part Of a HTML Page:
<div id='hiddenDivHL' style='display:none'>http://74.127.61.106/udayavaniIpad/details.php?home=0&catid=882&newsid=123069[InnerSep]http://www.udayavani.com/udayavani_cms/gall_content/2012/1/2012_1$thumbimg117_Jan_2012_000221787.jpg[InnerSep]ಯುವಜನತೆಯಿಂದ ಭವ್ಯಭಾರತ[OuterSep]
So far I have used the below code but I am unable to use getElementById.How to do that?
DOM Parser:
try {
URL url = new URL("http://74.127.61.106/udayavaniIpad/details_android.php?home=1&catid=882&newsid=27593");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("item");
/** Assign textview array lenght by arraylist size */
name = new TextView[nodeList.getLength()];
website = new TextView[nodeList.getLength()];
category = new TextView[nodeList.getLength()];
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
name[i] = new TextView(this);
Element fstElmnt = (Element) node;
NodeList nameList = fstElmnt.getElementsByTagName("hiddenDivHL");
Element nameElement = (Element) nameList.item(0);
nameList = nameElement.getChildNodes();
name[i].setText("Name = "
+ ((Node) nameList.item(0)).getNodeValue());
layout.addView(name[i]);
}
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
/** Set the layout view to display */
setContentView(layout);
}
XPath is IMHO the most common and easiest way to navigate the DOM in Java.
try{
URL url = new URL("http://74.127.61.106/udayavaniIpad/details_android.php?home=1& catid=882&newsid=27593");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/item/div[#id='hiddenDivHL']";
Node node = (Node) xpath.evaluate(expression, doc, XPathConstants.NODE);
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
I'm not sure if the XPath expression is right, but the link is here: http://developer.android.com/reference/javax/xml/xpath/package-summary.html
There are 2 differences between getElementById and getElementsByName:
getElementById requires a single unique id in your document, whereas getElementsByName can fetch several occurances of the same name.
getElementById is a method (or function) of the document object. You can only access it by using document.getElementById(..).
Your code seems to violate both these requirements, you seem to go through a loop of nodes and expect a hiddenDivHL id in each node list. So the id is not unique. Second your root point is not the document but the root point of each node in that list.
If you know you have a single instance with that id try document.getElementById.
I didn't really get the question.
a) Do you mean getting more elements by document.getElementById('hiddenDivHL')?
so my answer would be that, in a HTML-Document, the id has to be reserved for one element only.
b) If you just want to catch that element?
what exactly does not work? what are you trying to achieve? I fear I don't really get the point.
You have to call fstElmnt.getElementsByTagName("div"); to get all div's elements and them check if their attribute id is equal hiddenDivHL.
The easiest way i can think of is to use jSoup library, what it does is parse the DOM for you and lets you select elements using a css style (or jquery style) selector.
in this example you would do something like this
Document doc = Jsoup.connect("http://74.127.61.106/udayavaniIpad/details_android.php?home=1&catid=882&newsid=27593").get();
String divContents = doc.select("#hiddenDivHL").first().text();
Why are you unable to use getElementById()? It is in JavaSE 7 and JavaSE6/5/1.4.2, since 'DOM Level 2'.
To get the contents of an element in JavaScript:
var el = document.getElementById('hiddenDivHL');
var contents = el.innerHTML;
alert("Found " + contents.length + " characters of content.");
See your example on jsfiddle.
I think the confusion is due to the fact that your question is tagged JavaScript, but the code you posted is Java. They are different languages, and JavaScript people will only be confused by that parser. I haven't used Java in years so I can't really help you there.

Categories

Resources