I'm writing a simple code to scrape data from the web page using selenium and xpath2.0 function.
Since Selenium supports only xpath1.0 functions, I am trying to use Saxon.jar
I have downloaded and extracted the Saxon9he.jar files into the path "C:\Program Files\Java\jre1.8.0_111\lib\ext"
I have created a file "jaxp.properties" with the following lines:
javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl
javax.xml.xpath.XPathFactory","net.sf.saxon.xpath.XPathFactoryImpl
Also included my jar files in the eclipse library.
But, I am not able to fetch the values with the Xpath2.0 functions.
In my code, if I use
XPathFactory factory = XPathFactory.newInstance();
instead of
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
I am able to use the xpath1.0 functions. But I need Xpath2.0 function. please guide me in this.
My code is:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;
import javax.xml.xpath.XPathFunctionResolver;
import javax.xml.xpath.XPathVariableResolver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.xpath.XPathFactoryImpl;
public class XpathCheckClass {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{
WebDriver dr = new FirefoxDriver();
dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384");
try {
Thread.sleep(3000);
} catch (Exception e) {
}
String source = dr.getPageSource();
Document doc = null;
try {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = db.parse( new InputSource( new StringReader(source)));
} catch (Exception e) {
e.printStackTrace();
}
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
dr.close();
}
}
Recent releases of Saxon no longer advertise themselves as JAXP XPath services, so you need to instantiate the XPath factory explicitly:
XPathFactory xf = new net.sf.saxon.XPathFactoryImpl();
Related
I'm trying with this attempt to produce an xml based on the one given, joining values of same TagName.
For example this is what I've done so far:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class TestXPath {
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException, XPathExpressionException {
String xml =
"<ROOT>" +
" <coolnessId>9</coolnessId>" +
" <cars id=\"3\">0</cars>" +
" <cars id=\"2\">1</cars>" +
" <cars id=\"1\">2</cars>" +
"</ROOT>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = factory.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
XPath xpath = XPathFactory.newInstance().newXPath();
///XPathExpression expr = xpath.compile("concat(//ROOT/cars,'-',//ROOT/coolnessId)");//concat(//ROOT/cars)
XPathExpression expr = xpath.compile("concat(//ROOT/cars,'-')");//concat(//ROOT/cars)
// XPathExpression expr = xpath.compile( "concat(//*[contains(name(), 'cars')],'')");
System.out.println(expr.evaluate(doc, XPathConstants.STRING));
}
}
This code produces:
0-
Now this is what should be:
2-1-0
As you can see the values follow the attribute "id" of each "cars" tag.
I've rearrenged many times but can't achieve my result.
Please keep in mind I'm on a very old enviroment such as Java 1.4 runtime.
I think it's going to be simplest to retrieve the nodes using XPath, and then concatenate the string values in Java code.
Any other solution involves upgrading your technology: XSLT, XPath 2.0+, etc, and that isn't going to be easy on a JDK 1.4 platform.
I have an xml like below.
<name>
<value>123</value>
<value>456</value>
<value>789</value>
</name>
Now using java's Xpath query I tried below method
NodeList list3 = (NodeList) xpath.evaluate("name/value", element,XPathConstants.NODESET);
But it gives me only first value, how can I print all <value> tags ?
Your XPath expression is correct, there is most likely another problem in your code. You really should provide a complete example which demonstrates your problem.
The following code demonstrates how this would look like:
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class XmlTest {
public static void main(String[] args) throws Exception {
String xml = "<name>\n" +
"<value>123</value>\n" +
"<value>456</value>\n" +
"<value>789</value>\n" +
"</name>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
NodeList list = (NodeList) xpath.evaluate("name/value", doc, XPathConstants.NODESET);
for (int i = 0; i < list.getLength(); ++i) {
Node node = list.item(i);
System.out.println(node.getNodeName());
}
}
}
Running this results in the following output:
value
value
value
I already have downloaded wordnet2.0 full, but i am not getting how to use it as a graph because it consists of multiple RDF files. I want to use wordnet2.0 ontology as a graph in Eclipse. The following is the snippet of code that i am using for loading a ontology as a graph. I also want to know, Am i going in a right direction???
URIFactory factory = URIFactoryMemory.getSingleton();
URI graph_uri = factory.createURI("http://graph/");
G graph = new GraphMemory(graph_uri);
String fpath ="D:/Workspace/SSM/src/wordnet-wordsensesandwords.rdf";
GDataConf graphconf = new GDataConf(GFormat.RDF_XML, fpath);
GAction actionRerootConf = new GAction(GActionType.REROOTING);
GraphConf gConf = new GraphConf();
gConf.addGDataConf(graphconf);
gConf.addGAction(actionRerootConf);
// GraphLoaderGeneric.populate(graphconf, graph);
GraphLoaderGeneric.load(gConf, graph);
// General information about the graph
System.out.println(graph.toString());
http://wordnet.princeton.edu/wordnet/download/old-versions/
You can use this link to download the ontology and may use apache jena to query this
Once you have the results, you can represent it in the form of graph
You may also download wordnet in RDF format and can display it as graph using Protege tool
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.URL;
import nu.xom.Builder;
import nu.xom.ParsingException;
import nu.xom.ValidityException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import edu.mit.jwi.Dictionary;
import edu.mit.jwi.IDictionary;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.item.ISynset;
import edu.mit.jwi.item.IWord;
import edu.mit.jwi.item.IWordID;
import edu.mit.jwi.item.POS;
public class Main
{
public static void main(String[] args)
{
try
{
FileInputStream file = new FileInputStream(new File("c:\\employees.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println("*************************");
String expression = "/Employees/Employee[#emplid='3333']/job";
System.out.println(expression);
String job = xPath.compile(expression).evaluate(xmlDocument);
System.out.println(job);
System.out.println("*************************");
String path = "C:\\Program Files\\WordNet\\2.1\\dict";
URL url = new URL("file", null, path);
IDictionary dict = new Dictionary(url);
dict.open();
IIndexWord idxWord = dict . getIndexWord (job, POS. NOUN );
IWordID wordID = idxWord . getWordIDs ().get (0) ;
IWord word = dict . getWord ( wordID );
ISynset synset= word.getSynset();
for (IWord w : synset.getWords())
System.out.println(w.getLemma());
}
catch(Exception a)
{
System.out.println(a);
}
}
}
This is a sample code in which wornet can be queried for getting the synonyms of the word job from wordnet and using it to find similar terms like job from the RDF graph.
I have only worked with wornet for capturing related terms and hypernyms. Hope this may help
i developed a code that parses an xml file and returns result in a frame in netbeans .The code runs successfully from netbeans but when exporting the project into jar file it shows nothing .Please ,if you have ideas ,i will be thankful if you help me .Here is the code i used for parsing .
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;
public class Parsing {
String a = null;
public static void main(String a) throws
ParserConfigurationException, SAXException, IOException,
XPathExpressionException {
a = getElem(a);
}
public static String getElem(String a) throws ParserConfigurationException, SAXException, XPathExpressionException, IOException {
String file = "src/xml/read.xml";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
XPath xpath = XPathFactory.newInstance().newXPath();
Node CustomerId = (Node) xpath.evaluate("//Operation[#name='Read' and #modifier='Customer']/ParameterList/StringParameter[#name='CustomerId']/text()",
doc.getDocumentElement(), XPathConstants.NODE);
a = CustomerId.getNodeValue();
return a;
}
}
when calling the method getElem(a) from another frame it shows me the value of a in a textbox ,but when exporting the project into a jar file it doesn't show me anything !
First thing that strikes me is the use of relative path to your XML resource. It seems you rely on existence of a file in a directory relative to your working dir.
The src directory will not exist at run time. This suggests that the XML file is an em ebbed resource, and as such, can't get accessed as a file would be if it lived on the file system.
Instead you want to use something like...
getClass().getResourceAsStream("/xml/read.xml")
And pass the resulting InputStream to DocumentBuilder. Don't forget to close it
here is the modified code that worked with jar file :)
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Parsing {
String a = null;
private static final String TEST_XML = "/xml/read.xml";
public static void main(String a) throws
ParserConfigurationException, SAXException, IOException,
XPathExpressionException {
a = getElem(a);
}
protected static InputSource getTestXMLInputSource() {
InputStream is = Parsing.class.getResourceAsStream(TEST_XML);
is = Parsing.class.getResourceAsStream(TEST_XML);
return new InputSource(is);
}
public static String getElem(String a) throws ParserConfigurationException, SAXException, XPathExpressionException, IOException {
final InputSource source = getTestXMLInputSource();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(source);
XPath xpath = XPathFactory.newInstance().newXPath();
Node CustomerId = (Node) xpath.evaluate("//Operation[#name='Read' and #modifier='Customer']/ParameterList/StringParameter[#name='CustomerId']/text()",
doc.getDocumentElement(), XPathConstants.NODE);
a = CustomerId.getNodeValue();
return a;
}
}
I'm trying to parse an rdfs xml file in order to find all the Classes in an rdfs file.
The xpath: "/rdf:RDF/rdfs:Class"
is working in my XML editor.
When i insert the xpath in my Java program (i have implemented a dom parser), i get 0 Classes.
The following example runs but it outputs 0 classes!
I do:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.SAXException;
public class Main {
public static void main(String args[]) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException{
FindClasses FSB = new FindClasses();
FSB.FindAllClasses("C:\\Workspace\\file.xml"); //rdfs file
}
}
The class FindClasses is as follows:
import java.io.IOException;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class FindClasses {
public void FindAllClasses(String fileName) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
Object result = classes_expr.evaluate(doc, XPathConstants.NODESET);
NodeList classes = (NodeList) result;
System.out.println("I found : " + classes.getLength() + " classes " );
}
}
The rdfs file is:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:about="Class1">
</rdfs:Class>
<rdfs:Class rdf:about="Class2">
</rdfs:Class>
</rdf:RDF>
I don't really understand why the xpath returns 0 nodes in that example.
It's weird, cause i have implemented other dom parsers as well and they were working fine.
Can somebody help me?
Thanks
I visited the following link and i solved my problem:
Issues with xpath in java
The problem was that the xpath contained two namespaces (rdf,rdfs) like "/rdf:RDF/rdfs:Class".
If the xpath didn't contain any namespace e.g. /RDF/Class , there was not going to be an issue.
So after the line:
xpath = XPathFactory.newInstance().newXPath();
and before the line:
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
I added the following:
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "rdf": return "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
case "rdfs" : return "http://www.w3.org/2000/01/rdf-schema#";
}
return prefix;
}
public String getPrefix(String namespace) {
if (namespace.equals("rdf")) return "rdf";
else if (namespace.equals("rdfs")) return "rdfs";
else return null;
}
#Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
});