Generic xpath for a node at different locations - java

I have a lot of xml files from different versions of schemas. There are certain sections/tags in these xmls that are the same.
What I want to do is locate a perticular tag and start processing that tag. The thing is that this tag may appear at different locations in the xml.
So I am looking for a xpath that will locate this node irrespective of its location. I am using Java for writing my processing code.
Following are the various falvours of the xmls
Sample 1
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
Sample 2
<a>
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
</a>
Sample 3
<b>
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
</b>
In the above xmls I want to use the same xpath to locate the node 'nodeIWant'.
The Java code I am using is the following
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document modelDoc = docBuilder.parse(args[0]);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println(xPath.evaluate("//nodeIWant", modelDoc.getDocumentElement(), XPathConstants.NODE));
This prints out a null.
Final Edit
The answer by Mathias Müller works for these xml files. I am actually trying to query the .emx files in Rational Software Architect. I was trying to avaoid using these for examples. (Please don't start talking about BIRT and using the eclipse uml APIs etc... I have tried these and they do not give me what I want.)
The structure of the files is the following
<?xml version="1.0" encoding="UTF-8"?>
<!--xtools2_universal_type_manager-->
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.uml.msl.model" version="7.0.0"><feature description="" name="com.ibm.xtools.ruml.feature" url="" version="7.0.0"/></signature>?>
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.mmi.ui.signatures.diagram" version="7.0.0"><feature description="" name="Rational Modeling Platform (com.ibm.xtools.rmp)" url="" version="7.0.0"/></signature>?>
<xmi:XMI version="2.0" xmlns:Default="http:///schemas/Default/_fNm3AAqoEd6-N_NOT9vsCA/2" xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore" xmlns:uml="http://www.eclipse.org/uml2/3.0.0/UML" xmlns:xmi="http://www.omg.org/XMI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http:///schemas/Default/_fNm3AAqoEd6-N_NOT9vsCA/2 pathmap://UML2_MSL_PROFILES/Default.epx#_fNwoAAqoEd6-N_NOT9vsCA?Default/Default?">
<uml:Model name="A" xmi:id="_4lzSsMywEeGAuoBpYhfj6Q">
<!-- Lot of other stuff -->
</uml:Model>
<xmi:XMI>
The other file is
<?xml version="1.0" encoding="UTF-8"?>
<!--xtools2_universal_type_manager-->
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.uml.msl.model" version="7.0.0"><feature description="" name="com.ibm.xtools.ruml.feature" url="" version="7.0.0"/></signature>?>
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.mmi.ui.signatures.diagram" version="7.0.0"><feature description="" name="Rational Modeling Platform (com.ibm.xtools.rmp)" url="" version="7.0.0"/></signature>?>
<uml:Model xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore" xmlns:uml="http://www.eclipse.org/uml2/3.0.0/UML" xmi:id="_4lzSsMywEeGAuoBpYhfj6Q" name="A">
<!-- Lot of other stuff -->
</uml:Model>
Shouldn't the xpath of '//Model' work for these two samples as well?

You can use the xPath 'axis' //. This searches in the file for your node and doesn't care about the parent-nodes. So in your example you can use:
//nodeIWant

Not very familiar with DocumentBuilder, but perhaps you need to compile an XPath expression before evaluating it against a document? It seems it's not XPath expressions that are evaluated, XML documents are.
String expression = "//nodeIWant";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(modelDoc, XPathConstants.NODESET);
Or, if there is just one of those elements and you'd like to print its string value:
String expression = "//nodeIWant";
System.out.println(xPath.compile(expression).evaluate(modelDoc));
EDIT: You edited your question and revealed the actual XML you are evaluating path expressions against. Those new documents have namespaces that you need to take into account in XPath expressions.
//nodeIWant will never find a node if it is actually in a namespace. To find the Model node in your new documents, you'd have to use
//*[local-name() = 'Model']

Since the example you provided contains more than just a String in the <nodeIWant> elements you probably can benefit from using an object oriented approach combined with xpath. With data projection (disclosure: I'm affiliated with that project) it's possible to do this:
public class DataProjection {
public interface Book {
#XBRead("./title")
String getTitle();
//... more getter or setter methods
}
public static void main(String[] args) {
// Print all books in all <nodeIWant> elements of
for (String file : new String[] { "a.xml", "b.xml", "c.xml" }) {
List<Book> books = new XBProjector().io().file(file).evalXPath("//nodeIWant/book").asListOf(Book.class);
for (Book book : books) {
System.out.println(book.getTitle());
}
}
}
}
You can define one or more views (called projection interfaces) to the XML data and use XPath to connect the data to java objects implementing these interfaces. This helps a lot in structuring your code and have it reuseable for similar XML files.

Related

How to use tokenize function in Xpath

can we use tokenize function in XPath
The general java code i use to process XSLT and XML files are :
XPath xPath = XPathFactory.newInstance().newXPath();
InputSource inputXML = new InputSource(new StringReader(xml));
String expression = "/root/customer/personalDetails[age=tokenize('20|30','|')]/name";
boolean evaluate1 = (boolean) xPath.compile(expression).evaluate(inputXML, XPathConstants.BOOLEAN);
XML :-
<?xml version="1.0" encoding="ISO-8859-15"?>
<root>
<customer>
<personalDetails>
<name>ABC</name>
<value>20</value>
</personalDetails>
<personalDetails>
<name>XYZ</name>
<value>21</value>
</personalDetails>
<personalDetails>
<name>PQR</name>
<value>30</value>
</personalDetails>
</customer>
</root>
Expected Response :- ABC,PQR
Yes, you can use the tokenize() function in XPath, provided your XPath processor supports XPath 2.0 or later.
For Java, the popular choice of XPath 2.0+ processor is Saxon.
You can use the JAXP API with Saxon, however, it's not really designed to work well with XPath 2.0+, so it's preferable to use Saxon's own API (called s9api).
For this particular example, you don't need tokenize(). In XPath 2.0+ you can write
[age=('20', '30')]

Dom4j selectNodes(arg) don't give list of nodes

I am using DOM4j for XML work in java,
my xml is like this:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<abcd name="ab.catalog" xmlns="http://www.xyz.com/pqr" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xyz.com/pqr ./abc.xyz.xsd">
<efg>
......
</efg>
<efg>
.....
</efg>
</abcd>
then,
List<Node>list = document.selectNodes("/abcd/efg");
gets the size of list zero. I feel it's due to namespace specified in the xml.
I tried a lot but cn't get success.
Unprefixed element names in XPath expressions refer to elements that are not in a namespace - they do not take account of the "default" xmlns="..." namespace declared on the document. You need to declare a prefix for the namespace in the XPath engine and then use that prefix in the expression. Here is an example inspired by the DOM4J javadocs:
Map uris = new HashMap();
uris.put("pqr", "http://www.xyz.com/pqr");
XPath xpath = document.createXPath("/pqr:abcd/pqr:efg");
xpath.setNamespaceURIs(uris);
List<Node> nodes = xpath.selectNodes(document);
Modify your code :
List<Node>list = document.selectNodes("//abcd/efg");

What is the XPath expression to select text from orm.xml's <schema> element?

I've read XPath - how to select text and thought I had the general idea. But, as always, XPath rears up, hisses at me, and scuttles off to find the nearest bacteria-infested urinal to drown in.
I have a JPA orm.xml file. It looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<entity-mappings xmlns="http://java.sun.com/xml/ns/persistence/orm"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm
http://java.sun.com/xml/ns/persistence/orm_2_0.xsd"
version="2.0">
<persistence-unit-metadata>
<persistence-unit-defaults>
<schema>test</schema>
<catalog>test</catalog>
</persistence-unit-defaults>
</persistence-unit-metadata>
</entity-mappings>
The following XPath expression should, I would think, select the text from the <schema> element:
/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()
But using Java's XPath implementation, it does not.
More specifically, the following code fails (using JUnit asserts) on the last line. The value of the text variable is the empty string.
// Find the file: URL to the orm.xml I mentioned above.
final URL ormUrl = Thread.currentThread().getContextClassLoader().getResource("META-INF/orm.xml");
assertNotNull(ormUrl);
final XPathFactory xpf = XPathFactory.newInstance();
assertNotNull(xpf);
final XPath xpath = xpf.newXPath();
assertNotNull(xpath);
final XPathExpression expression = xpath.compile("/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()");
assertNotNull(expression);
final String text = expression.evaluate(new InputSource(ormUrl.openStream()));
assertEquals("test", text);
This seems to cast into doubt what little understanding I had of XPath expressions to begin with. Flailing around, I then wanted to see if a simple "/" would select the root element. Mercifully, this returned a non-null NodeList, but the NodeList was empty. I really don't want to hunt the authors of the Java XPath support down and string them up, but it's getting awfully difficult not to follow that course of action.
Please help me shoot XPath in the head once and for all. Thanks.
The problem is that the XML declares a default namespace
xmlns="http://java.sun.com/xml/ns/persistence/orm"
while in your XPath expression you have not provided a corresponding namespace context. See this link for details on how to work with namespace contexts. There's a lot of detail there, but in summary you have to write your own implementation of javax.xml.namespace.NamespaceContext that allows the XPath processor to map namespace prefixes to URIs. In your case you must provide a mapping for the default namespace to the appropriate URI.

How can I do type-safe Xpath queries in Java?

I'm currently using JDOM for doing some simple XML parsing, and it seems like nothing's type safe - I had a similar issue with using the built-in Java DOM parser, just with lots more API to wade through.
For example, XPath.selectNodes takes an Object as its argument and returns a raw list, which just feels a little Java 1.1
Are there generic-ized XML and XPath libraries for Java, or is there some reason why it's just not possible to do XPath queries in a type-safe way?
If you're familiar with CSS selectors on HTML, it may be good to know that Jsoup supports XML as well.
Update: OK, that was given the downvote apparently a very controversial answer. It may however end up to be easier and less verbose than Xpath when all you want is to select node values. The Jsoup API is namely very slick. Let's give a bit more concrete example. Assuming that you have a XML file which look like this:
<?xml version="1.0" encoding="UTF-8"?>
<persons>
<person id="1">
<name>John Doe</name>
<age>30</age>
<address>
<street>Main street 1</street>
<city>Los Angeles</city>
</address>
</person>
<person id="2">
<name>Jane Doe</name>
<age>40</age>
<address>
<street>Park Avenue 1</street>
<city>New York</city>
</address>
</person>
</persons>
Then you can traverse it like follows:
Document document = Jsoup.parse(new File("/persons.xml"), "UTF-8");
Element person2 = document.select("person[id=2]").first();
System.out.println(person2.select("name").text());
Elements streets = document.select("street");
for (Element street : streets) {
System.out.println(street.text());
}
which outputs
Jane Doe
Main street 1
Park Avenue 1
Update 2: since Jsoup 1.6.2 which was released March 2012, XML parsing is officially supported by the Jsoup API.
AFAIK all of the xml queries in java are non-typesafe and most are java 1.3 compatible. That said my favorite parser/generator is the xml pull parser (xmlpp) style parser. I believe java has XmlStreamReader and XmlStreamWriter if you're using 1.6 which are almost the same as the xmlpp library. I especially like that I can write a method getFoo that takes a stream reader and pulls from it and returns a Foo object. It's sort of the best between DOM and SAX. I think it may be referred to as StAX by some.
I'm getting a little ramble-y so I'm quitting now

Finding all valid xpath from xml

I am trying to write a program in java where in i can find all the xpath for the given xml.I found out the link on the internet xpath generator but it does not work when one element can repeat multipletimes for example if we have xml like the following :-
<?xml version="1.0" encoding="UTF-8"?>
<Report>
<Name>
<FirstName>A</FirstName>
<LastName>B</LastName>
<MiddleName>C</MiddleName>
</Name>
<Name>
<FirstName>D</FirstName>
<LastName>E</LastName>
<MiddleName>S</MiddleName>
</Name>
</Report>
It will produce xpaths :-
/Report/Name/firstname for both firstname nodes.
but the expected should be /Report/Name1/firstname and /Report/Name[2]/firstname
Any ideas?
I think you may have to do this yourself.
Using a SAX parser will make it straightforward. Just maintain a stack of the elements you encounter and a count so you can increment the indexes (/Report/Name[1], /Report/Name[2]) easily.

Categories

Resources