How to find a DOM Node using its attributes - java

I have parsed my HTML/JSP into DOM at compile time using JAVA. Now I have the w3c.dom.Document object with me, let's say for the below HTML
.....
....
<input type="text" name="EnterName"/>
<select name="SelectOptions">
<option>First</option>
<option>Second</option>
</select>
......
.......
I know the attributes values of the elements. Here "EnterName" is the "name" attributes value of the node "input".
Suppose I have attributes values of all nodes available in DOM (like "EnterName", "SelectOptions" of above HTML), how do I can get a node in which a particular attribute is available with the given value. Thanks
EDIT :
I will never know whats the HTML contents. My program should run on
given list of HTML/JSP files and I have with me some element names.
Here the element name refers to the label/name of the fields available
in the HTML/JSP. So I need to traverse through all the files get the
node where it has the same label/name and get the node.

Try something like this:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse("yourDocumentName");
doc.getDocumentElement().normalize();
NodeList nlList = doc.getElementsByTagName("input");
for (int indx= 0; indx < nList.getLength(); indx++) {
Element eElement = (Element) nList.item(indx);
if(eElement.getAttribute("name").equals("EnterName")){
System.out.println("EnterName: " + eElement.getNodeValue());
}
}
NodeList nlList1 = doc.getElementsByTagName("select");
for (int indx= 0; indx < nList1.getLength(); indx++) {
Element eElement = (Element) nList1.item(indx);
if(eElement.getAttribute("name").equals("SelectOptions")){
System.out.println("SelectOptions: " + eElement.getNodeValue());
}
}
If you could add the "id" to your elements then its much easier:
<input type="text" name="EnterName" id="name"/>
<select name="SelectOptions" id="options">
...
Element nameElement = doc.getElementbyId("name");
System.out.println("EnterName: " + nameElement.getNodeValue());
Element selectElement = doc.getElementbyId("name");
System.out.println("SelectOptions: " + selectElement.getNodeValue());

You can add custom attributes in html for example to differentiate betweeen html components
<input type="text" name="EnterName" myattr1="yes"/>
<select name="SelectOptions" myattr2="yes">
<option>First</option>
<option>Second</option>
</select>
on basis of custom attributes you can check and differentiate on the HTML components...

You can say something like this:
Element input = .... documene.getElementByTagName("input");
Attribute eneterName = root.getAttributeNode("EnterName");
String s = enterName.getXXXValue();
Please refer the API to get the correct method to retreive the value.

Normally you search for attributes by their name, e.g. "name", not by their value, e.g. "EnterName". So you would typically go
String valueForName = myElement.getAttribute("name");
For anything very complicated, I use XPath. Which works for what you want. Here's a blog that looks like just what you want (though it's not Java, it's close enough):
http://blogs.msdn.com/b/davidklinems/archive/2007/03/13/quick-tip-using-xpath-to-find-nodes-by-attribute-value.aspx
Here's a similar non-Java Stack Overflow link
Elaborating in Java, its a bit tedious, but, roughly...
XPathFactory anXPathfactory = XPathFactory.newInstance();
XPath xpath = anXPathfactory.newXPath();
XPathExpression xpe = xpath.compile("your xpath goes here");
String finallyIGetSomething = (String) xpe.evaluate(node, XPathConstants.STRING);
Haven't tested this for your case so caveat emptor

Related

Java DOM: getElementsByTagName is returning no elements?

I am trying to parse the XML document at http://web.mta.info/status/ServiceStatusSubway.xml and extract all the PtSituationElement elements with the following code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document subwayStatusDoc = builder.parse(new URL("http://web.mta.info/status/ServiceStatusSubway.xml").openStream());
NodeList situationList = subwayStatusDoc.getDocumentElement().getElementsByTagName("PtSituationElement");
System.out.println(situationList.item(0)); //prints null
What am I doing wrong here ?
The PtSituationElement tags contain child tags, so you need to go into those. Just printing .item(0) relies on the toString() method, and apparently it does not do a great job of explaining your nodes.
So add this to see some of the data in the child nodes:
Node item = situationList.item(0);
NodeList childNodes = item.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
System.out.println(childNodes.item(j).getTextContent());
}
(I'm not sure what you want to do with the data in the xml structure, but this example shows how you can proceed with your work.)
Also, I noted that the LongDescription tags contain HTML that is not correct XML (<br clear=left> should be <br clear=left> etc). The parser could have a problem with that. It would be better if the HTML was escaped (see How to escape "&" in XML?).

Getting null values from XPath query

I have this xml file:
<?xml version="1.0" encoding="UTF-8"?>
<iet:aw-data xmlns:iet="http://care.aw.com/IET/2007/12" class="com.aw.care.bean.resource.MessageResource">
<iet:metadata filter=""/>
<iet:message-resource>
<iet:message>some message 1</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.11</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
<iet:message-resource>
<iet:message>some message 2</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.12</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
.
.
.
.
</iet:aw-data>
Using this code below i'm getting over the data and finding what I need.
try {
FileInputStream fileIS = new FileInputStream(new File("resources\\bootstrap\\content\\MessageResources_iw_IL\\MessageResource_iw_IL.ctdata.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//*[local-name()='message-resource']//*[local-name()='code'][contains(text(), 'account')]";
NodeList nodeList = (NodeList) xPath.compile(query).evaluate(xmlDocument, XPathConstants.NODESET);
System.out.println("size= " + nodeList.getLength());
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getNodeValue());
}
}
catch (Exception e){
e.printStackTrace();
}
The issue is that i'm getting only null values while printing in the for loop, any idea why it's happened?
The code needs to return a list of nodes which have a code and message fields that contains a given parameters (same as like SQL query with two parameters with operator of AND between them)
Check the documentation:
https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html
getNodeValue() applied to an element node returns null.
Use getTextContent().
Alternatively, if you find DOM too frustrating, switch to one of the better tree models like JDOM2 or XOM.
Also, if you used an XPath 2.0 engine like Saxon, it would (a) simplify your expression to
//*:message-resource//*:code][contains(text(), 'account')]
and (b) allow you to return a sequence of strings from the XPath expression, rather than a sequence of nodes, so you wouldn't have to mess around with nodelists.
Another point: I suspect that the predicate [contains(text(), 'account')] should really be [.='account']. I'm not sure of that, but using text() instead of ".", and using contains() instead of "=", are both common mistakes.

java dom parser only gets the first entity

This code returs only one "question " tag's element but I have another 9 question element inside the xml file.What is the wrong thing in here?Do I need to loop.Because when I checked the loop, it loops only one time.What is the problem?I am figure out.
Here is my xml:
<Results>
<question>
<eno>3</eno>
<qno>1</qno>
<qtext>The Battle of Gettysburg was fought during which war?</qtext>
<correctAnswer>C</correctAnswer>
</question>
<question>
<eno>3</eno>
<qno>2</qno>
<qtext>Neil Armstrong and Buzz Aldrin walked how many
minutes on the moon in 1696?</qtext>
<correctAnswer>B</correctAnswer>
</question>
</Results>
my source code:
NodeList listOfQuestions = doc.getElementsByTagName("question");
for(int s=0; s<listOfQuestions.getLength(); s++)
{
System.out.println(listOfQuestions.getLength());
Node firstQuestionNode = listOfQuestions.item(0);
if(firstQuestionNode.getNodeType() == Node.ELEMENT_NODE){
Element firstQElement = (Element)firstQuestionNode;
NodeList enoList = firstQElement.getElementsByTagName("eno");
Element enoElement =(Element)enoList.item(s);
NodeList enosList = enoElement.getChildNodes();
String eno=((Node)enosList.item(s)).getNodeValue().trim();
System.out.println(eno);
NodeList qnoList = firstQElement.getElementsByTagName("qno");
Element qnoElement =(Element)qnoList.item(s);
NodeList qnosList = qnoElement.getChildNodes();
String qno= ((Node)qnosList.item(s)).getNodeValue().trim();
System.out.println(qno);
NodeList qtextList = firstQElement.getElementsByTagName("qtext");
Element qtextElement =(Element)qtextList.item(s);
NodeList qtextsList = qtextElement.getChildNodes();
String qtext= ((Node)qtextsList.item(s)).getNodeValue().trim();
System.out.println(qtext);
NodeList correctAnswerList = firstQElement.getElementsByTagName("correctAnswer");
Element correctAnswerElement =(Element)correctAnswerList.item(s);
NodeList correctAnswerElementList = correctAnswerElement.getChildNodes();
String correctAnswer= ((Node)correctAnswerElementList.item(s)).getNodeValue().trim();
System.out.println(correctAnswer);
int i=st.executeUpdate("insert into question(eno,qno,qtext,correctAnswer) values('"+eno+"','"+qno+"','"+qtext+"','"+correctAnswer+"')");
System.out.println("s is"+s);
}
}
You have hardcoded
Node firstQuestionNode = listOfQuestions.item(0);
^^^
I think you meant to use the variable s there... or maybe not, it's hard to tell what you're trying to do. Regardless, there are no other references to listOfQuestions and you never retrieve any node except the first one.
You should have a look at jsoup, it is an API specifically built for parsing HTML DOM code in java and has tons of extra features. Also, what you are currently trying to extract would not be more than just 3-4 LOC using the API components.
Look at their example on their website for connection to an URL and fetching DOM-Elements is just 2 LOC:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

How to parse this XML file

I am new to programming in java and i have just learned how to parse an xml file. But i am not getting any idea on how to parse this xml file. Please help me with a code on how to get the tags day1 and their inner tags order1,order2
<RoutePlan>
<day1>
<Order1>
<customer> XYZ</customer>
<address> INDIA </address>
<data> 10-10-2011 </data>
<time> 9.30 A.M </time>
</Order1>
<Order2>
<customer> ABC </customer>
<address> US </address>
<data> 10-10-2011 </data>
<time> 10.30 A.M </time>
</Order2>
</day1>
I wrote the following code to retrieve. But i am only getting the data in order1 but not in order2
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(file);
document.getDocumentElement().normalize();
System.out.println("Root Element: "+document.getDocumentElement().getNodeName());
NodeList node = document.getElementsByTagName("day1");
for(int i=0;i<node.getLength();i++){
Node firstNode = node.item(i);
Element element = (Element) firstNode;
NodeList customer = element.getElementsByTagName("customer");
Element customerElement = (Element) customer.item(0);
NodeList firstName = customerElement.getChildNodes();
System.out.println("Name: "+((firstName.item(0).getNodeValue())));
NodeList address = element.getElementsByTagName("address");
Element customerAddress = (Element) address.item(0);
NodeList addName = customerAddress.getChildNodes();
System.out.println("Address: "+((addName.item(0).getNodeValue())));
NodeList date = element.getElementsByTagName("date");
Element customerdate = (Element) date.item(0);
NodeList dateN = customerdate.getChildNodes();
System.out.println("Address: "+((dateN.item(0).getNodeValue())));
NodeList time = element.getElementsByTagName("time");
Element customertime = (Element) time.item(0);
NodeList Ntime = customertime.getChildNodes();
System.out.println("Time: "+((Ntime.item(0).getNodeValue())));
}
I can give you not one, not two, but three directions to parse this XML (there are more but let's say they are the most commons ones):
DOM -> two good resources to start : here and here
SAX -> quickstart from official website: here
StAX -> a good introduction: here
Judging by the size of your XML document, I'd probably go for a DOM parsing, which gonna be the easiest to implement and to use (but if you have to deal with larger files, take a look at SAX for reading-only manipulations and StAX for reading and writing ones).
The reason you are getting only "Order1" elements is because:
You lock on the "day1" node.
You retrieve the "customer" elements by tag name which returns 2 elements.
You retrieve the first element and print its value and hence the second "customer" is ignored.
When working with DOM, be prepared to spin up multiple loops for retrieving data. Also, you are a bit misguided when it comes to representing your schema. You really don't need to name "elements" as "day1"/"order1" etc. In XML, that can be simply expressed by having multiple "day" or "order" elements which in turn automatically enforces ordering. An example XML would look like:
<route-plan>
<day>
<order>
<something>
</order>
</day>
<day>
<order>
<something>
</order>
</day>
</route-plan>
Now retrieving "day" elements is a simple matter of:
Look up "day" elements by tag name
For each "day" element
Look up "order" element by tag name
For each "order" element
Print out the value of "customer"/"address" etc.

Why does my XPath expression in Java return too many children?

I have the following xml file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<config>
<a>
<b>
<param>p1</param>
<param>p2</param>
</b>
</a>
</config>
and the xpath code to get my node params:
Document doc = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/config/a/b");
Object o = expr.evaluate(doc, XPathConstants.NODESET);
NodeList list = (NodeList) o;
but it turns out that the nodes list (list) has 5 children, including "\t\n", instead of just two. Is there something wrong with my code? How can I just get my two nodes?
Thank you!
When you select /config/a/b/, you are selecting all children of b, which includes three text nodes and two elements. That is, given your XML above and only showing the fragment in question:
<b>
<param>p1</param>
<param>p2</param>
</b>
the first child is the text (whitespace) following <b> and preceding <param>p1 .... The second child is the first param element. The third child is the text (whitespace) between the two param elements. And so on. The whitespace isn't ignored in XML, although many forms of processing XML ignore it.
You have a couple choices:
Change your xpath expression so it will only select element nodes, as suggested by Ted Dziuba, or
Loop over the five nodes returned and only select the non-text nodes.
You could do something like this:
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() != Node.TEXT_NODE) {
System.out.println(nodes.item(i).getNodeValue());
}
}
You can use the node type to select only element nodes, or to remove text nodes.
so the xpath looks like:
/config/a/b/*/text().
And the output for :
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
would be as expected: p1 and p2
How about
/config/a/b/*/text()/..
?
import org.w3c.dom.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import java.io.IOException;
import org.xml.sax.SAXException;
public class TestClient_XPath {
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("yourfile.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xpath.compile("/a/b/c");
Object res = xPathExpression.evaluate(doc);
System.out.println(res.toString());
}
}
Xalan and Xerces appear to be embedded in rt.jar.
Don't include xerces and xalan libs.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4624775
I am not sure but shouldn't /config/a/b just return b? /config/a/b/param should return the two param nodes...
Could the view on the problem be the problem? Of course you get back the resulting node AND all its children. So you just have to look at the first element and not at its children.
But I can be totally wrong, because I am usually just use Xpath to navigate on DOM trees (HtmlUnit).

Categories

Resources