XPath to select all nodes in document with specified name in java

XPath to select all nodes in document with specified name in java - java

Sample XML
<Games>
<Indoor>
<TT></TT>
<Chess></chess>
<cricket>asd</cricket>
<ComputerGame>
<cricket>asd</cricket>
</ComputerGame>
</Indore>
<Outdoor>
<Football></Football>
<cricket>asd</cricket>
</outdoor>
</Games>
I want to select all the node with node name cricket.
for this I am :
NodeList nodeList= (NodeList)xpath.compile("//cricket").evaluate(xmlDocument,XPathConstants.NODESET);
But this code doesnt select any cricket node. PLEASE SUGGEST

Based on your "corrected" example XML...
<Games>
<Indoor>
<TT></TT>
<Chess>
</Chess>
<cricket>asd</cricket>
<ComputerGame>
<cricket>asd</cricket>
</ComputerGame>
</Indoor>
<Outdoors>
<Football></Football>
<cricket>asd</cricket>
</Outdoors>
</Games>
Using the following code...
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class TestXPath101 {
public static void main(String[] args) {
try {
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("Test.xml"));
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression exp = xPath.compile("//cricket");
NodeList nl = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + nl.getLength() + " results");
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException ex) {
ex.printStackTrace();
}
}
}
I was able to get it to output...
Found 3 results
I would "suspect" that you XML is ill formed and you are ignoring any exceptions that are being thrown because of it...

Related

XPath concat same tagname

I'm trying with this attempt to produce an xml based on the one given, joining values of same TagName.
For example this is what I've done so far:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class TestXPath {
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException, XPathExpressionException {
String xml =
"<ROOT>" +
" <coolnessId>9</coolnessId>" +
" <cars id=\"3\">0</cars>" +
" <cars id=\"2\">1</cars>" +
" <cars id=\"1\">2</cars>" +
"</ROOT>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = factory.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
XPath xpath = XPathFactory.newInstance().newXPath();
///XPathExpression expr = xpath.compile("concat(//ROOT/cars,'-',//ROOT/coolnessId)");//concat(//ROOT/cars)
XPathExpression expr = xpath.compile("concat(//ROOT/cars,'-')");//concat(//ROOT/cars)
// XPathExpression expr = xpath.compile( "concat(//*[contains(name(), 'cars')],'')");
System.out.println(expr.evaluate(doc, XPathConstants.STRING));
}
}
This code produces:
0-
Now this is what should be:
2-1-0
As you can see the values follow the attribute "id" of each "cars" tag.
I've rearrenged many times but can't achieve my result.
Please keep in mind I'm on a very old enviroment such as Java 1.4 runtime.

I think it's going to be simplest to retrieve the nodes using XPath, and then concatenate the string values in Java code.
Any other solution involves upgrading your technology: XSLT, XPath 2.0+, etc, and that isn't going to be easy on a JDK 1.4 platform.

Xpath 2.0 functions not working in Java using Saxon

I'm writing a simple code to scrape data from the web page using selenium and xpath2.0 function.
Since Selenium supports only xpath1.0 functions, I am trying to use Saxon.jar
I have downloaded and extracted the Saxon9he.jar files into the path "C:\Program Files\Java\jre1.8.0_111\lib\ext"
I have created a file "jaxp.properties" with the following lines:
javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl
javax.xml.xpath.XPathFactory","net.sf.saxon.xpath.XPathFactoryImpl
Also included my jar files in the eclipse library.
But, I am not able to fetch the values with the Xpath2.0 functions.
In my code, if I use
XPathFactory factory = XPathFactory.newInstance();
instead of
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
I am able to use the xpath1.0 functions. But I need Xpath2.0 function. please guide me in this.
My code is:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;
import javax.xml.xpath.XPathFunctionResolver;
import javax.xml.xpath.XPathVariableResolver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.xpath.XPathFactoryImpl;
public class XpathCheckClass {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{
WebDriver dr = new FirefoxDriver();
dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384");
try {
Thread.sleep(3000);
} catch (Exception e) {
}
String source = dr.getPageSource();
Document doc = null;
try {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = db.parse( new InputSource( new StringReader(source)));
} catch (Exception e) {
e.printStackTrace();
}
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
dr.close();
}
}

Recent releases of Saxon no longer advertise themselves as JAXP XPath services, so you need to instantiate the XPath factory explicitly:
XPathFactory xf = new net.sf.saxon.XPathFactoryImpl();

Issues with xpath that contained namespaces in Java (Dom parser)

I'm trying to parse an rdfs xml file in order to find all the Classes in an rdfs file.
The xpath: "/rdf:RDF/rdfs:Class"
is working in my XML editor.
When i insert the xpath in my Java program (i have implemented a dom parser), i get 0 Classes.
The following example runs but it outputs 0 classes!
I do:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.SAXException;
public class Main {
public static void main(String args[]) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException{
FindClasses FSB = new FindClasses();
FSB.FindAllClasses("C:\\Workspace\\file.xml"); //rdfs file
}
}
The class FindClasses is as follows:
import java.io.IOException;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class FindClasses {
public void FindAllClasses(String fileName) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
Object result = classes_expr.evaluate(doc, XPathConstants.NODESET);
NodeList classes = (NodeList) result;
System.out.println("I found : " + classes.getLength() + " classes " );
}
}
The rdfs file is:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:about="Class1">
</rdfs:Class>
<rdfs:Class rdf:about="Class2">
</rdfs:Class>
</rdf:RDF>
I don't really understand why the xpath returns 0 nodes in that example.
It's weird, cause i have implemented other dom parsers as well and they were working fine.
Can somebody help me?
Thanks

I visited the following link and i solved my problem:
Issues with xpath in java
The problem was that the xpath contained two namespaces (rdf,rdfs) like "/rdf:RDF/rdfs:Class".
If the xpath didn't contain any namespace e.g. /RDF/Class , there was not going to be an issue.
So after the line:
xpath = XPathFactory.newInstance().newXPath();
and before the line:
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
I added the following:
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "rdf": return "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
case "rdfs" : return "http://www.w3.org/2000/01/rdf-schema#";
}
return prefix;
}
public String getPrefix(String namespace) {
if (namespace.equals("rdf")) return "rdf";
else if (namespace.equals("rdfs")) return "rdfs";
else return null;
}
#Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
});

XML filtering using getelementsbytagname

I'm trying to parse a xml file using the below program but wondering why the getFirstChild() is blank while printing...
The nodelist contains all the employee nodes and I am processing each node and trying to get the firstchild and lastchild..
xml file:
<?xml version="1.0"?>
<Employees>
<Employee emplid="1111" type="admin">
<firstname>John</firstname>
<lastname>Watson</lastname>
<age>30</age>
<email>johnwatson#sh.com</email>
</Employee>
<Employee emplid="2222" type="admin">
<firstname>Sherlock</firstname>
<lastname>Homes</lastname>
<age>32</age>
<email>sherlock#sh.com</email>
</Employee>
</Employees>
java program:
package XML;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class XMLTest {
/**
* #param args
*/
public static void main(String[] args) {
DocumentBuilderFactory builderfactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = builderfactory.newDocumentBuilder();
Document xmldocument = builder.parse(new FileInputStream(new File("c:/employees.xml")));
NodeList node = xmldocument.getElementsByTagName("Employee");
System.out.println("node length="+node.getLength());
for (int temp = 0; temp < node.getLength(); temp++){
System.out.println("First Child = " +node.item(temp).getFirstChild().getNodeValue());
System.out.println("Last Child = " +node.item(temp).getLastChild().getNodeValue());
}
} catch (ParserConfigurationException | SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

It's most likely due to the whitespace (spaces, tabs, line breaks etc.) that comes through as text nodes in the list as well as the elements.
When working with java's XML DOM I tend to write a helper like this as it's pretty tedious.

The DocumentBuilderFactory controls the handling of whitespaces try:
builderFactory.setIgnoringElementContentWhitespace(true);
Hope it helps!

Unable to retrieve web data in java using tidy and Xpath

What I am trying to do is scrape a simple inner HTML from a XHTML file.
I have narrowed down my search to the element node, but I fail to retrieve the information.
PLEASE NOTE: the element node has no child node. I get a null pointer exception for doing that
here is the HTML SNIPPET
<div id="dvTitle" class="titlebtmbrdr01" style="line-height: 22px;">BAJAJ AUTO LTD. </div>
PLease also NOTE that this file has namespace as
http://www.w3.org/1999/xhtml
You can see that I have the div element from which I want BAJAJ AUTO LTD.
Here is the code that i am using
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Vector;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import jxl.read.biff.BiffException;
import jxl.write.WriteException;
import jxl.write.biff.RowsExceededException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import com.sun.org.apache.xml.internal.serialize.Serializer;
public class BSEQuotesExtractor implements valueExtractor {
#Override
public Vector<String> getName(Document d) throws XPathExpressionException, RowsExceededException, BiffException, WriteException, IOException {
// TODO Auto-generated method stub
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new MynamespaceContext());
Object result = xpath.evaluate("//*[#id='dvTitle']",d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
System.out.println(nodes.item(0).getNodeName());
System.out.println(nodes.item(0).getAttributes().item(1).getNodeName());
System.out.println(nodes.item(0).getAttributes().item(1).getNodeValue());
System.out.println(nodes.item(0).getTextContent());
return null;
}
public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException, RowsExceededException, BiffException, WriteException{
BSEQuotesExtractor q = new BSEQuotesExtractor();
DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
Document d = parser.getDocument();
q.getName(d);
}
}
And this is the output I get
1
div
dvTitle
null
Now why do I get that null? I should get BAJAJ AUTO LTD.

When I open the page your code references, that div actually is empty for me:
<div class="titlebtmbrdr01" id="dvTitle" style="line-height: 22px;"></div>
So perhaps you should save the page content to some file to examine if it is the same for you. If it is, but your browser displays things differently, then figure out what combination of cookies and other headers makes a difference there.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XPath to select all nodes in document with specified name in java - java

Related

XPath concat same tagname

Xpath 2.0 functions not working in Java using Saxon

Issues with xpath that contained namespaces in Java (Dom parser)

XML filtering using getelementsbytagname

Unable to retrieve web data in java using tidy and Xpath

Categories

Resources