I am using the XQuery processor Saxon.
Now we write our XQuery in a ".xqy" file where we refer to the XML file on which we will perform our XQuery.
Please see the example below:
for $x in doc("books.xml")/books/book
where $x/price>30
return $x/title
Now I want to use dynamically generated XML not stored in some path. Say, for example, I want to refer below XML that is available as a string.
How to do that?
String book=
<books>
<book category="JAVA">
<title lang="en">Learn Java in 24 Hours</title>
<author>Robert</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="DOTNET">
<title lang="en">Learn .Net in 24 hours</title>
<author>Peter</author>
<year>2011</year>
<price>40.50</price>
</book>
<book category="XML">
<title lang="en">Learn XQuery in 24 hours</title>
<author>Robert</author>
<author>Peter</author>
<year>2013</year>
<price>50.00</price>
</book>
<book category="XML">
<title lang="en">Learn XPath in 24 hours</title>
<author>Jay Ban</author>
<year>2010</year>
<price>16.50</price>
</book>
</books>
Java code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;
import com.saxonica.xqj.SaxonXQDataSource;
public class XQueryTester {
public static void main(String[] args){
try {
execute();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XQException e) {
e.printStackTrace();
}
}
private static void execute() throws FileNotFoundException, XQException{
InputStream inputStream = new FileInputStream(new File("books.xqy"));
XQDataSource ds = new SaxonXQDataSource();
XQConnection conn = ds.getConnection();
XQPreparedExpression exp = conn.prepareExpression(inputStream);
XQResultSequence result = exp.executeQuery();
while (result.next()) {
System.out.println(result.getItemAsString(null));
}
}
}
If you look for a way to bind the input (the context item) of the query using Java, I recommend using Saxon's S9API (the most intuitive API for XSLT, XPath and XQuery processing in Java).
Here is how to instantiate Saxon, compile the query, parse the input and evaluate the query with the input document bound as its context item:
// the Saxon processor object
Processor saxon = new Processor(false);
// compile the query
XQueryCompiler compiler = saxon.newXQueryCompiler();
XQueryExecutable exec = compiler.compile(new File("yours.xqy"));
// parse the string as a document node
DocumentBuilder builder = saxon.newDocumentBuilder();
String input = "<xml>...</xml>";
Source src = new StreamSource(new StringReader(input));
XdmNode doc = builder.build(src);
// instantiate the query, bind the input and evaluate
XQueryEvaluator query = exec.load();
query.setContextItem(doc);
XdmValue result = query.evaluate();
Note that if the Java code is generating the XML document, I strongly advice you to use S9API to build the tree directly in memory, instead of generating the XML document as a string, then parse it. If possible.
here is how I implemented as suggested by above user-
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;
import net.sf.saxon.s9api.DocumentBuilder;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.XQueryCompiler;
import net.sf.saxon.s9api.XQueryEvaluator;
import net.sf.saxon.s9api.XQueryExecutable;
import net.sf.saxon.s9api.XdmNode;
import net.sf.saxon.s9api.XdmValue;
import com.saxonica.xqj.SaxonXQDataSource;
public class Xml { public static void main(String[] args){
try {
process();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void process() throws SaxonApiException, IOException{
// the Saxon processor object
Processor saxon = new Processor(false);
// compile the query
XQueryCompiler compiler = saxon.newXQueryCompiler();
XQueryExecutable exec;
exec = compiler.compile(new File("E:\\abc.xqy"));
// parse the string as a document node
DocumentBuilder builder = saxon.newDocumentBuilder();
String input = "<data><employee id=\"1\"><name>A</name>"
+ "<title>Manager</title></employee>+<employee id=\"2\"><name>B</name>"
+ "<title>Manager</title></employee>+</data>";
Source src = new StreamSource(new StringReader(input));
XdmNode doc = builder.build(src);
// instantiate the query, bind the input and evaluate
XQueryEvaluator query = exec.load();
query.setContextItem(doc);
XdmValue result = query.evaluate();
System.out.println(result.itemAt(0));
}
Related
I want to get nodes from an api which contains xml elements.
https://www.w3schools.com/xml/cd_catalog.xml Here is the link for the api.
So my Java code is like this:
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class Test1 {
public static final String GET_API_URL = "https://www.w3schools.com/xml/cd_catalog.xml";
public static void main(String[] args) throws IOException, InterruptedException {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder().GET().header("accept", "application/xml").uri(URI.create(GET_API_URL)).build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
try {
Reader reader = new StringReader(response.body());
InputSource inputSource = new InputSource(reader);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/CATALOG/CD[COUNTRY='USA' and YEAR>=1995]");
NodeList list = (NodeList)expr.evaluate(inputSource, XPathConstants.NODESET);
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
System.out.println(node.getTextContent());
}
}
catch (XPathExpressionException e) {
e.printStackTrace();
}
}
}
The output on the console should be like this:
<CATALOG>
<CD>
<TITLE>1999 Grammy Nominees</TITLE>
<ARTIST>Many</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Grammy</COMPANY>
<PRICE>10.20</PRICE>
<YEAR>1999</YEAR>
</CD>
<CD>
<TITLE>Big Willie style</TITLE>
<ARTIST>Will Smith</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1997</YEAR>
</CD>
</CATALOG>
With the xpath expression i get only the cd's values that are released from 1995 and later, but without the xml tags.
My console output is like this:
1999 Grammy Nominees
Many
USA
Grammy
10.20
1999
Big Willie style
Will Smith
USA
Columbia
9.90
1997
So any solutions on how to get the exact same output but with the xml tags? And if someone answers can you explain to me how your steps or methods are working in the code, sorry for asking but a beginner here and i have a lot of ground to cover :-)
Thanks in advance.
The Transformer class
can help us achieve what you want
Here is a method to transform Node to String
private static String nodeToString(Node node) throws TransformerException {
StringWriter res = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(new DOMSource(node), new StreamResult(res));
return res.toString();
}
And you can call it like this
System.out.println(nodeToString(node));
I'm writing a simple code to scrape data from the web page using selenium and xpath2.0 function.
Since Selenium supports only xpath1.0 functions, I am trying to use Saxon.jar
I have downloaded and extracted the Saxon9he.jar files into the path "C:\Program Files\Java\jre1.8.0_111\lib\ext"
I have created a file "jaxp.properties" with the following lines:
javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl
javax.xml.xpath.XPathFactory","net.sf.saxon.xpath.XPathFactoryImpl
Also included my jar files in the eclipse library.
But, I am not able to fetch the values with the Xpath2.0 functions.
In my code, if I use
XPathFactory factory = XPathFactory.newInstance();
instead of
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
I am able to use the xpath1.0 functions. But I need Xpath2.0 function. please guide me in this.
My code is:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;
import javax.xml.xpath.XPathFunctionResolver;
import javax.xml.xpath.XPathVariableResolver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.xpath.XPathFactoryImpl;
public class XpathCheckClass {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{
WebDriver dr = new FirefoxDriver();
dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384");
try {
Thread.sleep(3000);
} catch (Exception e) {
}
String source = dr.getPageSource();
Document doc = null;
try {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = db.parse( new InputSource( new StringReader(source)));
} catch (Exception e) {
e.printStackTrace();
}
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
dr.close();
}
}
Recent releases of Saxon no longer advertise themselves as JAXP XPath services, so you need to instantiate the XPath factory explicitly:
XPathFactory xf = new net.sf.saxon.XPathFactoryImpl();
I have some XML that roughly looks like this:
<project type="mankind">
<suggestion>Build the enterprise</suggestion>
<suggestion>Learn Esperanto</suggestion>
<problem>Solve world hunger</suggestion>
<discussion>Do Vulcans exist</discussion>
</project>
I want to use XPath to find out the names of the second level elements (there can be elements I won't know upfront) using Java. This is the code I tried:
public NodeList xpath2NodeList(Document doc, String xPathString) throws XPathExpressionException {
XPath xpath = XPathFactory.newInstance().newXPath();
MagicNamespaceContext nsc = new MagicNamespaceContext();
xpath.setNamespaceContext(nsc);
Object exprResult = xpath.evaluate(xPathString, doc, XPathConstants.NODESET);
return (NodeList) exprResult;
}
My XPath is /project/*/name(). I get the error:
javax.xml.transform.TransformerException: Unknown nodetype: name
A query like /project/suggestion works as expected. What am I missing? I'd like to get a list with the tag names.
Java6 (don't ask).
I think your implementation only supports XPath 1.0. If that were true, only the following would work:
"name(/project/*)"
The reason for this is that in the XPath 1.0 model, you cannot use functions (like name()) as a step in a path expression. Your code throws an exception and in this case, the processor mistakes your function name() for an unknown node test (like comment()). But there is nothing wrong with using a path expression as the argument of the name() function.
Unfortunately, if an XPath 1.0 function that can only handle a single node as an argument is given a sequence of nodes, only the first one is used. Therefore, it is likely that you will only get the first element name as a result.
XPath 1.0's capability to manipulate is very limited and often the easiest way to get around such problems is to reach for the higher-level language that uses XPath as the query language (in your case Java). Or put another way: Write an XPath expression to retrieve all relevant nodes and iterate over the result, returning the element names, in Java.
With XPath 2.0, your inital query would be fine. Also see this related question.
Below code may answer your original question.
package com.example.xpath;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XPathReader {
static XPath xPath = XPathFactory.newInstance().newXPath();
public static void main(String[] args) {
try {
FileInputStream file = new FileInputStream(new File("c:/mankind.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPathExpression expr = xPath.compile("//project/*");
NodeList list= (NodeList) expr.evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
System.out.println(node.getNodeName() + "=" + node.getTextContent());
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
}
Assuming the input is corrected (Solve world hunger), the code should print:
suggestion=Build the enterprise
suggestion=Learn Esperanto
problem=Solve world hunger
discussion=Do Vulcans exist
I'm trying to parse an rdfs xml file in order to find all the Classes in an rdfs file.
The xpath: "/rdf:RDF/rdfs:Class"
is working in my XML editor.
When i insert the xpath in my Java program (i have implemented a dom parser), i get 0 Classes.
The following example runs but it outputs 0 classes!
I do:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.SAXException;
public class Main {
public static void main(String args[]) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException{
FindClasses FSB = new FindClasses();
FSB.FindAllClasses("C:\\Workspace\\file.xml"); //rdfs file
}
}
The class FindClasses is as follows:
import java.io.IOException;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class FindClasses {
public void FindAllClasses(String fileName) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
Object result = classes_expr.evaluate(doc, XPathConstants.NODESET);
NodeList classes = (NodeList) result;
System.out.println("I found : " + classes.getLength() + " classes " );
}
}
The rdfs file is:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:about="Class1">
</rdfs:Class>
<rdfs:Class rdf:about="Class2">
</rdfs:Class>
</rdf:RDF>
I don't really understand why the xpath returns 0 nodes in that example.
It's weird, cause i have implemented other dom parsers as well and they were working fine.
Can somebody help me?
Thanks
I visited the following link and i solved my problem:
Issues with xpath in java
The problem was that the xpath contained two namespaces (rdf,rdfs) like "/rdf:RDF/rdfs:Class".
If the xpath didn't contain any namespace e.g. /RDF/Class , there was not going to be an issue.
So after the line:
xpath = XPathFactory.newInstance().newXPath();
and before the line:
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
I added the following:
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "rdf": return "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
case "rdfs" : return "http://www.w3.org/2000/01/rdf-schema#";
}
return prefix;
}
public String getPrefix(String namespace) {
if (namespace.equals("rdf")) return "rdf";
else if (namespace.equals("rdfs")) return "rdfs";
else return null;
}
#Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
});
I'm trying to parse a xml file using the below program but wondering why the getFirstChild() is blank while printing...
The nodelist contains all the employee nodes and I am processing each node and trying to get the firstchild and lastchild..
xml file:
<?xml version="1.0"?>
<Employees>
<Employee emplid="1111" type="admin">
<firstname>John</firstname>
<lastname>Watson</lastname>
<age>30</age>
<email>johnwatson#sh.com</email>
</Employee>
<Employee emplid="2222" type="admin">
<firstname>Sherlock</firstname>
<lastname>Homes</lastname>
<age>32</age>
<email>sherlock#sh.com</email>
</Employee>
</Employees>
java program:
package XML;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class XMLTest {
/**
* #param args
*/
public static void main(String[] args) {
DocumentBuilderFactory builderfactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = builderfactory.newDocumentBuilder();
Document xmldocument = builder.parse(new FileInputStream(new File("c:/employees.xml")));
NodeList node = xmldocument.getElementsByTagName("Employee");
System.out.println("node length="+node.getLength());
for (int temp = 0; temp < node.getLength(); temp++){
System.out.println("First Child = " +node.item(temp).getFirstChild().getNodeValue());
System.out.println("Last Child = " +node.item(temp).getLastChild().getNodeValue());
}
} catch (ParserConfigurationException | SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
It's most likely due to the whitespace (spaces, tabs, line breaks etc.) that comes through as text nodes in the list as well as the elements.
When working with java's XML DOM I tend to write a helper like this as it's pretty tedious.
The DocumentBuilderFactory controls the handling of whitespaces try:
builderFactory.setIgnoringElementContentWhitespace(true);
Hope it helps!