I'm trying with this attempt to produce an xml based on the one given, joining values of same TagName.
For example this is what I've done so far:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class TestXPath {
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException, XPathExpressionException {
String xml =
"<ROOT>" +
" <coolnessId>9</coolnessId>" +
" <cars id=\"3\">0</cars>" +
" <cars id=\"2\">1</cars>" +
" <cars id=\"1\">2</cars>" +
"</ROOT>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = factory.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
XPath xpath = XPathFactory.newInstance().newXPath();
///XPathExpression expr = xpath.compile("concat(//ROOT/cars,'-',//ROOT/coolnessId)");//concat(//ROOT/cars)
XPathExpression expr = xpath.compile("concat(//ROOT/cars,'-')");//concat(//ROOT/cars)
// XPathExpression expr = xpath.compile( "concat(//*[contains(name(), 'cars')],'')");
System.out.println(expr.evaluate(doc, XPathConstants.STRING));
}
}
This code produces:
0-
Now this is what should be:
2-1-0
As you can see the values follow the attribute "id" of each "cars" tag.
I've rearrenged many times but can't achieve my result.
Please keep in mind I'm on a very old enviroment such as Java 1.4 runtime.
I think it's going to be simplest to retrieve the nodes using XPath, and then concatenate the string values in Java code.
Any other solution involves upgrading your technology: XSLT, XPath 2.0+, etc, and that isn't going to be easy on a JDK 1.4 platform.
Related
I'm writing a simple code to scrape data from the web page using selenium and xpath2.0 function.
Since Selenium supports only xpath1.0 functions, I am trying to use Saxon.jar
I have downloaded and extracted the Saxon9he.jar files into the path "C:\Program Files\Java\jre1.8.0_111\lib\ext"
I have created a file "jaxp.properties" with the following lines:
javax.xml.transform.TransformerFactory = net.sf.saxon.TransformerFactoryImpl
javax.xml.xpath.XPathFactory","net.sf.saxon.xpath.XPathFactoryImpl
Also included my jar files in the eclipse library.
But, I am not able to fetch the values with the Xpath2.0 functions.
In my code, if I use
XPathFactory factory = XPathFactory.newInstance();
instead of
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
I am able to use the xpath1.0 functions. But I need Xpath2.0 function. please guide me in this.
My code is:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;
import javax.xml.xpath.XPathFunctionResolver;
import javax.xml.xpath.XPathVariableResolver;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import net.sf.saxon.lib.NamespaceConstant;
import net.sf.saxon.xpath.XPathFactoryImpl;
public class XpathCheckClass {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{
WebDriver dr = new FirefoxDriver();
dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384");
try {
Thread.sleep(3000);
} catch (Exception e) {
}
String source = dr.getPageSource();
Document doc = null;
try {
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
doc = db.parse( new InputSource( new StringReader(source)));
} catch (Exception e) {
e.printStackTrace();
}
System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl");
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON);
// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
dr.close();
}
}
Recent releases of Saxon no longer advertise themselves as JAXP XPath services, so you need to instantiate the XPath factory explicitly:
XPathFactory xf = new net.sf.saxon.XPathFactoryImpl();
I already have downloaded wordnet2.0 full, but i am not getting how to use it as a graph because it consists of multiple RDF files. I want to use wordnet2.0 ontology as a graph in Eclipse. The following is the snippet of code that i am using for loading a ontology as a graph. I also want to know, Am i going in a right direction???
URIFactory factory = URIFactoryMemory.getSingleton();
URI graph_uri = factory.createURI("http://graph/");
G graph = new GraphMemory(graph_uri);
String fpath ="D:/Workspace/SSM/src/wordnet-wordsensesandwords.rdf";
GDataConf graphconf = new GDataConf(GFormat.RDF_XML, fpath);
GAction actionRerootConf = new GAction(GActionType.REROOTING);
GraphConf gConf = new GraphConf();
gConf.addGDataConf(graphconf);
gConf.addGAction(actionRerootConf);
// GraphLoaderGeneric.populate(graphconf, graph);
GraphLoaderGeneric.load(gConf, graph);
// General information about the graph
System.out.println(graph.toString());
http://wordnet.princeton.edu/wordnet/download/old-versions/
You can use this link to download the ontology and may use apache jena to query this
Once you have the results, you can represent it in the form of graph
You may also download wordnet in RDF format and can display it as graph using Protege tool
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.URL;
import nu.xom.Builder;
import nu.xom.ParsingException;
import nu.xom.ValidityException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import edu.mit.jwi.Dictionary;
import edu.mit.jwi.IDictionary;
import edu.mit.jwi.item.IIndexWord;
import edu.mit.jwi.item.ISynset;
import edu.mit.jwi.item.IWord;
import edu.mit.jwi.item.IWordID;
import edu.mit.jwi.item.POS;
public class Main
{
public static void main(String[] args)
{
try
{
FileInputStream file = new FileInputStream(new File("c:\\employees.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println("*************************");
String expression = "/Employees/Employee[#emplid='3333']/job";
System.out.println(expression);
String job = xPath.compile(expression).evaluate(xmlDocument);
System.out.println(job);
System.out.println("*************************");
String path = "C:\\Program Files\\WordNet\\2.1\\dict";
URL url = new URL("file", null, path);
IDictionary dict = new Dictionary(url);
dict.open();
IIndexWord idxWord = dict . getIndexWord (job, POS. NOUN );
IWordID wordID = idxWord . getWordIDs ().get (0) ;
IWord word = dict . getWord ( wordID );
ISynset synset= word.getSynset();
for (IWord w : synset.getWords())
System.out.println(w.getLemma());
}
catch(Exception a)
{
System.out.println(a);
}
}
}
This is a sample code in which wornet can be queried for getting the synonyms of the word job from wordnet and using it to find similar terms like job from the RDF graph.
I have only worked with wornet for capturing related terms and hypernyms. Hope this may help
I'm trying to parse an rdfs xml file in order to find all the Classes in an rdfs file.
The xpath: "/rdf:RDF/rdfs:Class"
is working in my XML editor.
When i insert the xpath in my Java program (i have implemented a dom parser), i get 0 Classes.
The following example runs but it outputs 0 classes!
I do:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPathExpressionException;
import org.xml.sax.SAXException;
public class Main {
public static void main(String args[]) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException{
FindClasses FSB = new FindClasses();
FSB.FindAllClasses("C:\\Workspace\\file.xml"); //rdfs file
}
}
The class FindClasses is as follows:
import java.io.IOException;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class FindClasses {
public void FindAllClasses(String fileName) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
Object result = classes_expr.evaluate(doc, XPathConstants.NODESET);
NodeList classes = (NodeList) result;
System.out.println("I found : " + classes.getLength() + " classes " );
}
}
The rdfs file is:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdfs:Class rdf:about="Class1">
</rdfs:Class>
<rdfs:Class rdf:about="Class2">
</rdfs:Class>
</rdf:RDF>
I don't really understand why the xpath returns 0 nodes in that example.
It's weird, cause i have implemented other dom parsers as well and they were working fine.
Can somebody help me?
Thanks
I visited the following link and i solved my problem:
Issues with xpath in java
The problem was that the xpath contained two namespaces (rdf,rdfs) like "/rdf:RDF/rdfs:Class".
If the xpath didn't contain any namespace e.g. /RDF/Class , there was not going to be an issue.
So after the line:
xpath = XPathFactory.newInstance().newXPath();
and before the line:
XPathExpression classes_expr = xpath.compile("/rdf:RDF/rdfs:Class");
I added the following:
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "rdf": return "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
case "rdfs" : return "http://www.w3.org/2000/01/rdf-schema#";
}
return prefix;
}
public String getPrefix(String namespace) {
if (namespace.equals("rdf")) return "rdf";
else if (namespace.equals("rdfs")) return "rdfs";
else return null;
}
#Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
});
Sample XML
<Games>
<Indoor>
<TT></TT>
<Chess></chess>
<cricket>asd</cricket>
<ComputerGame>
<cricket>asd</cricket>
</ComputerGame>
</Indore>
<Outdoor>
<Football></Football>
<cricket>asd</cricket>
</outdoor>
</Games>
I want to select all the node with node name cricket.
for this I am :
NodeList nodeList= (NodeList)xpath.compile("//cricket").evaluate(xmlDocument,XPathConstants.NODESET);
But this code doesnt select any cricket node. PLEASE SUGGEST
Based on your "corrected" example XML...
<Games>
<Indoor>
<TT></TT>
<Chess>
</Chess>
<cricket>asd</cricket>
<ComputerGame>
<cricket>asd</cricket>
</ComputerGame>
</Indoor>
<Outdoors>
<Football></Football>
<cricket>asd</cricket>
</Outdoors>
</Games>
Using the following code...
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class TestXPath101 {
public static void main(String[] args) {
try {
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File("Test.xml"));
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression exp = xPath.compile("//cricket");
NodeList nl = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + nl.getLength() + " results");
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException ex) {
ex.printStackTrace();
}
}
}
I was able to get it to output...
Found 3 results
I would "suspect" that you XML is ill formed and you are ignoring any exceptions that are being thrown because of it...
What I am trying to do is scrape a simple inner HTML from a XHTML file.
I have narrowed down my search to the element node, but I fail to retrieve the information.
PLEASE NOTE: the element node has no child node. I get a null pointer exception for doing that
here is the HTML SNIPPET
<div id="dvTitle" class="titlebtmbrdr01" style="line-height: 22px;">BAJAJ AUTO LTD. </div>
PLease also NOTE that this file has namespace as
http://www.w3.org/1999/xhtml
You can see that I have the div element from which I want BAJAJ AUTO LTD.
Here is the code that i am using
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Vector;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import jxl.read.biff.BiffException;
import jxl.write.WriteException;
import jxl.write.biff.RowsExceededException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import com.sun.org.apache.xml.internal.serialize.Serializer;
public class BSEQuotesExtractor implements valueExtractor {
#Override
public Vector<String> getName(Document d) throws XPathExpressionException, RowsExceededException, BiffException, WriteException, IOException {
// TODO Auto-generated method stub
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new MynamespaceContext());
Object result = xpath.evaluate("//*[#id='dvTitle']",d, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
System.out.println(nodes.item(0).getNodeName());
System.out.println(nodes.item(0).getAttributes().item(1).getNodeName());
System.out.println(nodes.item(0).getAttributes().item(1).getNodeValue());
System.out.println(nodes.item(0).getTextContent());
return null;
}
public static void main(String[] args) throws MalformedURLException, IOException, XPathExpressionException, RowsExceededException, BiffException, WriteException{
BSEQuotesExtractor q = new BSEQuotesExtractor();
DOMParser parser = new DOMParser(new URL("http://www.bseindia.com/bseplus/StockReach/StockQuote/Equity/BAJAJ%20AUTO%20LTD/BAJAJAUT/532977/Scrips").openStream());
Document d = parser.getDocument();
q.getName(d);
}
}
And this is the output I get
1
div
dvTitle
null
Now why do I get that null? I should get BAJAJ AUTO LTD.
When I open the page your code references, that div actually is empty for me:
<div class="titlebtmbrdr01" id="dvTitle" style="line-height: 22px;"></div>
So perhaps you should save the page content to some file to examine if it is the same for you. If it is, but your browser displays things differently, then figure out what combination of cookies and other headers makes a difference there.