How to getElementById using DOM? - java

I am having part of HTML page given below and want to extract the content of div tag its id is hiddenDivHL using DOM Parser:
Part Of a HTML Page:
<div id='hiddenDivHL' style='display:none'>http://74.127.61.106/udayavaniIpad/details.php?home=0&catid=882&newsid=123069[InnerSep]http://www.udayavani.com/udayavani_cms/gall_content/2012/1/2012_1$thumbimg117_Jan_2012_000221787.jpg[InnerSep]ಯುವಜನತೆಯಿಂದ ಭವ್ಯಭಾರತ[OuterSep]
So far I have used the below code but I am unable to use getElementById.How to do that?
DOM Parser:
try {
URL url = new URL("http://74.127.61.106/udayavaniIpad/details_android.php?home=1&catid=882&newsid=27593");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList = doc.getElementsByTagName("item");
/** Assign textview array lenght by arraylist size */
name = new TextView[nodeList.getLength()];
website = new TextView[nodeList.getLength()];
category = new TextView[nodeList.getLength()];
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
name[i] = new TextView(this);
Element fstElmnt = (Element) node;
NodeList nameList = fstElmnt.getElementsByTagName("hiddenDivHL");
Element nameElement = (Element) nameList.item(0);
nameList = nameElement.getChildNodes();
name[i].setText("Name = "
+ ((Node) nameList.item(0)).getNodeValue());
layout.addView(name[i]);
}
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
/** Set the layout view to display */
setContentView(layout);
}

XPath is IMHO the most common and easiest way to navigate the DOM in Java.
try{
URL url = new URL("http://74.127.61.106/udayavaniIpad/details_android.php?home=1& catid=882&newsid=27593");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/item/div[#id='hiddenDivHL']";
Node node = (Node) xpath.evaluate(expression, doc, XPathConstants.NODE);
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
I'm not sure if the XPath expression is right, but the link is here: http://developer.android.com/reference/javax/xml/xpath/package-summary.html

There are 2 differences between getElementById and getElementsByName:
getElementById requires a single unique id in your document, whereas getElementsByName can fetch several occurances of the same name.
getElementById is a method (or function) of the document object. You can only access it by using document.getElementById(..).
Your code seems to violate both these requirements, you seem to go through a loop of nodes and expect a hiddenDivHL id in each node list. So the id is not unique. Second your root point is not the document but the root point of each node in that list.
If you know you have a single instance with that id try document.getElementById.

I didn't really get the question.
a) Do you mean getting more elements by document.getElementById('hiddenDivHL')?
so my answer would be that, in a HTML-Document, the id has to be reserved for one element only.
b) If you just want to catch that element?
what exactly does not work? what are you trying to achieve? I fear I don't really get the point.

You have to call fstElmnt.getElementsByTagName("div"); to get all div's elements and them check if their attribute id is equal hiddenDivHL.

The easiest way i can think of is to use jSoup library, what it does is parse the DOM for you and lets you select elements using a css style (or jquery style) selector.
in this example you would do something like this
Document doc = Jsoup.connect("http://74.127.61.106/udayavaniIpad/details_android.php?home=1&catid=882&newsid=27593").get();
String divContents = doc.select("#hiddenDivHL").first().text();

Why are you unable to use getElementById()? It is in JavaSE 7 and JavaSE6/5/1.4.2, since 'DOM Level 2'.

To get the contents of an element in JavaScript:
var el = document.getElementById('hiddenDivHL');
var contents = el.innerHTML;
alert("Found " + contents.length + " characters of content.");
See your example on jsfiddle.
I think the confusion is due to the fact that your question is tagged JavaScript, but the code you posted is Java. They are different languages, and JavaScript people will only be confused by that parser. I haven't used Java in years so I can't really help you there.

Related

Getting child node Java

I have the following code
try {
String xml = "<ADDITIONALIDENT><FEATURE MID=\"TEST\"><NAME>ONE NAME</NAME><VALUE>ONE VALUE</VALUE></FEATURE><FEATURE MID=\"TEST\"><NAME>TWO NAME</NAME><VALUE>TWO VALUE</VALUE></FEATURE><FEATURE MID=\"TEST\"><NAME>THREE NAME</NAME><VALUE>THREE VALUE</VALUE></FEATURE><FEATURE MID=\"TEST\"><NAME>FOUR NAME</NAME><VALUE>FOUR VALUE</VALUE></FEATURE><FEATURE MID=\"TEST\"><NAME>FIVE NAME</NAME><VALUE>FIVE VALUE</VALUE></FEATURE></ADDITIONALIDENT>";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document document = dBuilder.newDocument();
document = dBuilder.parse(new InputSource(new StringReader(xml)));
NodeList featureList = document.getElementsByTagName("FEATURE");
for (int i = 0; i < featureList.getLength(); i++) {
Element featureElement = (Element) featureList.item(i);
NodeList nameList = featureElement.getElementsByTagName("NAME");
NodeList valueList = featureElement.getElementsByTagName("VALUE");
System.out.println("THIS IS NAME: " + nameList.item(0).getTextContent());
System.out.println("THIS IS VALUE: " + valueList.item(0).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
It works fine and it finds the correct values, but I don't think I am doing it the right way. I feel like I shouldn't be using lists within the actual featureList Element.
Is there a way to get the values without making two lists?
<ADDITIONALIDENT>
<FEATURE MID="TEST">
<NAME>ONE NAME</NAME>
<VALUE>ONE VALUE</VALUE>
</FEATURE>
<FEATURE MID="TEST">
<NAME>TWO NAME</NAME>
<VALUE>TWO VALUE</VALUE>
</FEATURE>
<ADDITIONALIDENT>
try with following solution,
try {
String xml = "YOUR_XML_CONTEN";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document document = dBuilder.newDocument();
document = dBuilder.parse(new InputSource(new StringReader(xml)));
NodeList featureList = document.getElementsByTagName("FEATURE");
for (int i = 0; i < featureList.getLength(); i++) {
Element featureElement = (Element) featureList.item(i);
System.out.println("THIS IS NAME: " +
featureElement.getElementsByTagName("NAME").item(0).getTextContent());
System.out.println("THIS IS VALUE: " +
featureElement.getElementsByTagName("VALUE").item(0).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
output,
THIS IS NAME: ONE NAME
THIS IS VALUE: ONE VALUE
THIS IS NAME: TWO NAME
THIS IS VALUE: TWO VALUE
THIS IS NAME: THREE NAME
THIS IS VALUE: THREE VALUE
THIS IS NAME: FOUR NAME
THIS IS VALUE: FOUR VALUE
THIS IS NAME: FIVE NAME
THIS IS VALUE: FIVE VALUE
Good question. There are several ways to query an XML document using Java:
a) Parse the XML document into a Document and use getElementsByTagName to extract the nodes that you need. This is your current approach. It is OK for simple documents but it does not scale well because the Document class knows nothing about the structure of the document. The getElementsByTagName() method has to assume that any tag that it finds might occur more than once.
But you can fix that...
b) Generate Java classes for your specific document structure. This requires you to have an XML schema that describes the structure of your XML. You can then use JAXB to generate Java classes to process your specific XML format. In your example, the generated code would know (from the schema) that there is exactly one instance of NAME and VALUE within each FEATURE tag. The getter methods for NAME and VALUE would return a single Node, so your code would not need to use arrays for single-occurrence elements.
See https://docs.oracle.com/javase/tutorial/jaxb/intro/index.html for more details.
c) Use the XPath support that is built into Java to extract exactly the nodes that you need. XPath is designed for processing XML documents and is very powerful and flexible.
See How to read XML using XPath in Java for more details.
Option a) is hardly ever used for processing non-trivial XML documents. Both b) and c) are very common.

Getting a node from an XML document

I use the worldweatheronline API. The service gives xml in the following form:
<hourly>
<tempC>-3</tempC>
<weatherDesc>rain</weatherDesc>
<precipMM>0.0</precipMM>
</hourly>
<hourly>
<tempC>5</tempC>
<weatherDesc>no</weatherDesc>
<precipMM>0.1</precipMM>
</hourly>
Can I somehow get all the nodes <hourly> in which <tempC>> 0 and <weatherDesc> = rain?
How to exclude from the response the nodes that are not interesting to me <hourly>?
This is quite feasible using XPath.
You can filter a document based on element values, attribute values and other criteria.
Here is a working example that gets the elements according to the first point in the question:
try (InputStream is = Files.newInputStream(Paths.get("C:/temp/test.xml"))) {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document xmlDocument = builder.parse(is);
XPath xPath = XPathFactory.newInstance().newXPath();
// get hourly elements that have tempC child element with value > 0 and weatherDesc child element with value = "rain"
String expression = "//hourly[tempC>0 and weatherDesc=\"rain\"]";
NodeList hours = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < hours.getLength(); i++) {
System.out.println(hours.item(i) + " " + hours.item(i).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
I think you should create xsd from xml and generate JAXB classes.Using those JAXB class you can easily unmarshal the xml and process your logic.

Building DOM document from xml string gives me a null document

I'm trying to use the DOM library to parse a string in xml format. For some reason my document contains nulls and I run into issues trying to parse it. The string variable 'response' is not null and I am able to see the string when in debug mode.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(response));
Document doc = builder.parse(is);
NodeList nodes = doc.getElementsByTagName("BatchFile");;
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList batchItem = element.getChildNodes();
String uri = batchItem.item(0).getNodeValue();
String id = batchItem.item(1).getNodeValue();
String fqName = batchItem.item(2).getNodeValue();
}
Highlighting over the line Document doc = builder.parse(is); after it has run shows the result of [#document: null].
Edit: I've managed to not got an empty doc now but the string values are still null (at end of code). How would I get the value of something like this
<GetBatchFilesResult>
<BatchFile>
<Uri>uri</Uri>
<ID>id</ID>
<FQName>file.zip</FQName>
</BatchFile>
</GetBatchFilesResult>
You can also use getTextContent(). getNodeValue will return null for elements. Besides, you'd better use getElementsByTagName, since white spaces are also treated as one of the child nodes.
Element element = (Element) nodes.item(i);
String uri = element.getElementsByTagName("Uri").item(0).getTextContent();
String id = element.getElementsByTagName("ID").item(0).getTextContent();
String fqName = element.getElementsByTagName("FQName").item(0).getTextContent();
Check Node API document to see what type of nodes will return null for getNodeValue.
I found the solution. Seems stupid that you have to do it this way to get a value from a node.
Element element = (Element) nodes.item(i);
NodeList batchItem = element.getChildNodes();
Element uri = (Element) batchItem.item(0);
Element id = (Element) batchItem.item(1);
Element fqName = (Element) batchItem.item(2);
NodeList test = uri.getChildNodes();
NodeList test1 = id.getChildNodes();
NodeList test2 = fqName.getChildNodes();
String strURI= test.item(0).getNodeValue();
String strID= test1.item(0).getNodeValue();
String strFQName= test2.item(0).getNodeValue();

How to traverse this XML to get DATA?

I am trying to getting information of the item in the xml that is presented like this:
<item>
<title>The Colbert Report - Confused by Rick Parry With an "A" for America</title>
<guid isPermaLink="false">http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</guid>
<link>http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert-report-confused-by-rick-parry-with-an-a-for-america</link>
<description><a href="http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0"><img src="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" align="right" hspace="10" vspace="10" width="145" height="80" border="0" /></a><p>The fat cat media elites in Des Moines think they can sit in their ivory corn silos and play puppet master with national politics.</p><p><a href="http://www.hulu.com/users/add_to_playlist?from=feed&video_id=267788">Add this to your queue</a><br/>Added: Fri Aug 12 09:59:14 UTC 2011<br/>Air date: Thu Aug 11 00:00:00 UTC 2011<br/>Duration: 05:39<br/>Rating: 4.7 / 5.0<br/></p><img src="http://feeds.feedburner.com/~r/HuluPopularVideosThisWeek/~4/6aeJ5cWMBzw" height="1" width="1"/></description>
<pubDate>Fri, 12 Aug 2011 09:59:14 -0000</pubDate>
<media:thumbnail height="80" width="145" url="http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg" />
<media:credit>Comedy Central</media:credit>
<dcterms:valid>start=2011-08-12T00:15:00Z; end=2011-09-09T23:45:00Z; scheme=W3C-DTF</dcterms:valid>
<feedburner:origLink>http://www.hulu.com/watch/267788/the-colbert-report-confused-by-rick-parry-with-an-a-for-america#http%3A%2F%2Fwww.hulu.com%2Ffeed%2Fpopular%2Fvideos%2Fthis_week%3Frd%3D0</feedburner:origLink></item>
<item>
I need the title, link, media:thumbnail url and description.
I have used the method found in: http://www.rgagnon.com/javadetails/java-0573.html
Things work fine for title and link, but not on the image url and description.
Can someone help me with this?
You can use XPath to retrieve particular data from an XML document.
For example in order to retrieve the content of the url attribute:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String url = xpath.evaluate("/item/media:thumbnail/#url", new InputSource("data.xml"));
try {
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new FileReader(new File("item.xml")));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("item");
// iterate the employees
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName("title");
Element line = (Element) title.item(0);
System.out.println("title: " + line.getTextContent());
NodeList link = element.getElementsByTagName("link");
line = (Element) link.item(0);
System.out.println("link: " + line.getTextContent());
NodeList mt = element.getElementsByTagName("media:thumbnail");
line = (Element) mt.item(0);
System.out.println("media:thumbnail: " + line.getTextContent());
Attr url = line.getAttributeNode("url");
System.out.println("media:thumbnail -> url: " + url.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
For url, you first get element media:thumbnail, and then since url is an attribute of media:thumbnail, you simply call the function getAttributeNode("url") from the media:thumbnail element.
For pure DOM solution you could use following code to fetch wanted values:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("document.xml");
Element item = doc.getDocumentElement(); // assuming that item is a root element
NodeList itemChilds = item.getChildNodes();
for (int i = 0; i != itemChilds.getLength(); ++i)
{
Node itemChildNode = itemChilds.item(i);
if (!(itemChildNode instanceof Element))
continue;
Element itemChild = (Element) itemChildNode;
String itemChildName = itemChild.getNodeName();
if (itemChildName.equals("title")) // possible switch in Java 7
System.out.println("title: " + itemChild.getTextContent());
else if (itemChildName.equals("link"))
System.out.println("link: " + itemChild.getTextContent());
else if (itemChildName.equals("description"))
System.out.println("description: " + itemChild.getTextContent());
else if (itemChildName.equals("media:thumbnail"))
System.out.println("image url: " + itemChild.getAttribute("url"));
}
Result:
title: The Colbert Report - Confused by Rick Parry With an "A" for America
link: http://rss.hulu.com/~r/HuluPopularVideosThisWeek/~3/6aeJ5cWMBzw/the-colbert..
description: <a href="http://www.hulu.com/watch/267788/the-colbert-report-confuse..
image url: http://thumbnails.hulu.com/507/40025507/40025507_145x80_generated.jpg
The problem here is that the description tag contains an escaped xml (or perhaps html) string rather than just xml.
Probably the easiest thing to do is to get the text contained by this tag and open another XML parser to parse it as a separate XML document. This may not work if it's actually an html fragment and not valid xml however.

How to update XML using XPath and Java

I have an XML document, and an XPath expression for that doc. I have to update the doc by using XPath at runtime.
How can I do this using Java?
The below is my xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<PersonList>
<Person>
<Name>Sonu Kapoor</Name>
<Age>24</Age>
<Gender>M</Gender>
<PostalCode>54879</PostalCode>
</Person>
<Person>
<Name>Jasmin</Name>
<Age>28</Age>
<Gender>F</Gender>
<PostalCode>78745</PostalCode>
</Person>
<Person>
<Name>Josef</Name>
<Age>232</Age>
<Gender>F</Gender>
<PostalCode>53454</PostalCode>
</Person>
</PersonList>
I have to change the values of name and age under //PersonList/Person[2]/Name.
Use setNodeValue. First, get a NodeList, for example:
myNodeList = (NodeList) xpath.compile("//MyXPath/text()")
.evaluate(myXmlDoc, XPathConstants.NODESET);
Then set the value of e.g. the first node:
myNodeList.item(0).setNodeValue("Hi mom!");
More examples e.g. here.
As mentioned in two other answers here, as well as in your previous question: technically, XPath is not a way to "update" an XML document, but only to locate nodes within an XML document. But I presume the above is what you want.
EDIT: Responding to your comment... Are you asking how to write your DOM to an XML file after you've finished editing the DOM? If so, here are two examples of how to do it:
http://www.java2s.com/Code/Java/XML/WriteDOMout.htm
http://download.oracle.com/javaee/1.4/tutorial/doc/JAXPXSLT4.html
You can delete the file and create a new one.
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(
new InputSource("data.xml"));
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//employee/name[text()='old']", doc,
XPathConstants.NODESET);
for (int idx = 0; idx < nodes.getLength(); idx++) {
nodes.item(idx).setTextContent("new value");
}
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(doc), new StreamResult(new File("data_new.xml")));
XPath is used to select parts of an XML document.It has no provision for updating. But since it returns DOM objects (Elements, if memory serves, or maybe Nodes) you can then use DOM methods for altering the document.
XPath can be used to select nodes in a document, not for modification
You apply the xpath expression to your document and get an element (in your case). Once you have this Element, you can use the Element methods to change values (name and age in your case)
Starting from a NodeList it should work like that:
NodeList nodes = getNodeListFromXPathExpression(); // you know how
if (nodes.length == 0)
return; // empty nodelist, xpath didn't select anything
Node first = node.getItem(0); // take the first from the list, your element
// this is a shortcut for your example:
// first is the actual selected element (a node)
// .getFirst() returns the first child node, the "text node" (="Jasmine", ="28")
// .setNodeValue() replace the actual value of that text node with a new string
first.getFirstChild().setNodeValue("New Name or new age");
Consider using XQuery Update instead of XPath. This allows you to write
replace value of node //PersonList/Person[2]/Name with "Anonymous"
This is much easier than using the Java DOM API.
I've created a small project for using XPATH to create/update XML:
https://github.com/shenghai/xmodifier
the code to change your xml is like:
Document document = readDocument("personList.xml");
XModifier modifier = new XModifier(document);
modifier.addModify("//PersonList/Person[2]/Name", "newName");
modifier.modify();
This is a super cool function where you can able to modify any tag value for any XML document using its xpath. You need to pass three arguments xml,xpathExpression and newValue and it returns the XML file as String with modified value.
If you want to pass XML as file, you need to change the function accordingly. But the logic will be same.
public String updateXML(String xml, String xpathExpression, String newValue)
{
try {
//Creating document builder
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new org.xml.sax.InputSource(new StringReader(xml)));
//Evaluating xpath expression using Element
XPath xpath = XPathFactory.newInstance().newXPath();
Element element = (Element)xpath.evaluate(xpathExpression, document, XPathConstants.NODE);
//Setting value in the text
element.setTextContent(value);
//Transformation of document to xml
StringWriter stringWriter = new StringWriter();
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(document), new StreamResult(stringWriter));
xml = stringWriter.toString();
}
catch (Exception e)
{
e.printStackTrace();
}
return xml;
}
Here is the code to change the content with vtd-xml... vtd-xml is unique in that it is the only API that offers incremental update capability.
import com.ximpleware.*;
import java.io.*;
public class changeName {
public static void main(String s[]) throws VTDException,java.io.UnsupportedEncodingException,java.io.IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
XMLModifier xm = new XMLModifier(vn);
ap.selectXPath("//PersonList/Person[2]");
int i=0;
while((i=ap.evalXPath())!=-1){
if (vn.toElement(VTDNav.FIRST_CHILD,"Name")){
int k=vn.getText();
if (i!=-1)
xm.updateToken(k, "Jonathan");
vn.toElement(VTDNav.PARENT);
}
if (vn.toElement(VTDNav.FIRST_CHILD,"Age")){
int k=vn.getText();
if (i!=-1)
xm.updateToken(k, "42");
vn.toElement(VTDNav.PARENT);
}
}
xm.output("new.xml");
}
}

Categories

Resources