I am trying to write a program in java where in i can find all the xpath for the given xml.I found out the link on the internet xpath generator but it does not work when one element can repeat multipletimes for example if we have xml like the following :-
<?xml version="1.0" encoding="UTF-8"?>
<Report>
<Name>
<FirstName>A</FirstName>
<LastName>B</LastName>
<MiddleName>C</MiddleName>
</Name>
<Name>
<FirstName>D</FirstName>
<LastName>E</LastName>
<MiddleName>S</MiddleName>
</Name>
</Report>
It will produce xpaths :-
/Report/Name/firstname for both firstname nodes.
but the expected should be /Report/Name1/firstname and /Report/Name[2]/firstname
Any ideas?
I think you may have to do this yourself.
Using a SAX parser will make it straightforward. Just maintain a stack of the elements you encounter and a count so you can increment the indexes (/Report/Name[1], /Report/Name[2]) easily.
Related
I an trying to get data using XPath.
I can reach the data what I want, but when there are multiple data, only the first one is selected.
And I want to count the number of target data.
For example, I want to count the numbers of queues whose message-vpn's name is vpn/b.
XML structure is as follows:
<queues>
<queue>
<name> queue/a </name>
<info>
<message-vpn> vpn/a </message-vpn>
</info>
</queue>
<queue>
<name> queue/b </name>
<info>
<message-vpn> vpn/b </message-vpn>
</info>
</queue>
<queue>
<name> queue/c </name>
<info>
<message-vpn> vpn/b </message-vpn>
</info>
</queue>
</queues>
And here's the xpath script I used.
/queues/queue/info/message-vpn[text()=("vpn/b")]
When I access the data, only queue/b is selected, not c.
Please help me to do that..
Your original xpath expression fails because the text vpn/b is surrounded by spaces. So another way to do it is
/queues/queue/info/message-vpn[normalize-space()=("vpn/b")]
If you want to count the number of these queues, use the count() method:
count(/queues/queue/info/message-vpn[normalize-space()=("vpn/b")])
Try below:
//queues/queue/info/message-vpn[contains(.,'vpn/b')]
I have a lot of xml files from different versions of schemas. There are certain sections/tags in these xmls that are the same.
What I want to do is locate a perticular tag and start processing that tag. The thing is that this tag may appear at different locations in the xml.
So I am looking for a xpath that will locate this node irrespective of its location. I am using Java for writing my processing code.
Following are the various falvours of the xmls
Sample 1
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
Sample 2
<a>
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
</a>
Sample 3
<b>
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
</b>
In the above xmls I want to use the same xpath to locate the node 'nodeIWant'.
The Java code I am using is the following
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document modelDoc = docBuilder.parse(args[0]);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println(xPath.evaluate("//nodeIWant", modelDoc.getDocumentElement(), XPathConstants.NODE));
This prints out a null.
Final Edit
The answer by Mathias Müller works for these xml files. I am actually trying to query the .emx files in Rational Software Architect. I was trying to avaoid using these for examples. (Please don't start talking about BIRT and using the eclipse uml APIs etc... I have tried these and they do not give me what I want.)
The structure of the files is the following
<?xml version="1.0" encoding="UTF-8"?>
<!--xtools2_universal_type_manager-->
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.uml.msl.model" version="7.0.0"><feature description="" name="com.ibm.xtools.ruml.feature" url="" version="7.0.0"/></signature>?>
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.mmi.ui.signatures.diagram" version="7.0.0"><feature description="" name="Rational Modeling Platform (com.ibm.xtools.rmp)" url="" version="7.0.0"/></signature>?>
<xmi:XMI version="2.0" xmlns:Default="http:///schemas/Default/_fNm3AAqoEd6-N_NOT9vsCA/2" xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore" xmlns:uml="http://www.eclipse.org/uml2/3.0.0/UML" xmlns:xmi="http://www.omg.org/XMI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http:///schemas/Default/_fNm3AAqoEd6-N_NOT9vsCA/2 pathmap://UML2_MSL_PROFILES/Default.epx#_fNwoAAqoEd6-N_NOT9vsCA?Default/Default?">
<uml:Model name="A" xmi:id="_4lzSsMywEeGAuoBpYhfj6Q">
<!-- Lot of other stuff -->
</uml:Model>
<xmi:XMI>
The other file is
<?xml version="1.0" encoding="UTF-8"?>
<!--xtools2_universal_type_manager-->
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.uml.msl.model" version="7.0.0"><feature description="" name="com.ibm.xtools.ruml.feature" url="" version="7.0.0"/></signature>?>
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.mmi.ui.signatures.diagram" version="7.0.0"><feature description="" name="Rational Modeling Platform (com.ibm.xtools.rmp)" url="" version="7.0.0"/></signature>?>
<uml:Model xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore" xmlns:uml="http://www.eclipse.org/uml2/3.0.0/UML" xmi:id="_4lzSsMywEeGAuoBpYhfj6Q" name="A">
<!-- Lot of other stuff -->
</uml:Model>
Shouldn't the xpath of '//Model' work for these two samples as well?
You can use the xPath 'axis' //. This searches in the file for your node and doesn't care about the parent-nodes. So in your example you can use:
//nodeIWant
Not very familiar with DocumentBuilder, but perhaps you need to compile an XPath expression before evaluating it against a document? It seems it's not XPath expressions that are evaluated, XML documents are.
String expression = "//nodeIWant";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(modelDoc, XPathConstants.NODESET);
Or, if there is just one of those elements and you'd like to print its string value:
String expression = "//nodeIWant";
System.out.println(xPath.compile(expression).evaluate(modelDoc));
EDIT: You edited your question and revealed the actual XML you are evaluating path expressions against. Those new documents have namespaces that you need to take into account in XPath expressions.
//nodeIWant will never find a node if it is actually in a namespace. To find the Model node in your new documents, you'd have to use
//*[local-name() = 'Model']
Since the example you provided contains more than just a String in the <nodeIWant> elements you probably can benefit from using an object oriented approach combined with xpath. With data projection (disclosure: I'm affiliated with that project) it's possible to do this:
public class DataProjection {
public interface Book {
#XBRead("./title")
String getTitle();
//... more getter or setter methods
}
public static void main(String[] args) {
// Print all books in all <nodeIWant> elements of
for (String file : new String[] { "a.xml", "b.xml", "c.xml" }) {
List<Book> books = new XBProjector().io().file(file).evalXPath("//nodeIWant/book").asListOf(Book.class);
for (Book book : books) {
System.out.println(book.getTitle());
}
}
}
}
You can define one or more views (called projection interfaces) to the XML data and use XPath to connect the data to java objects implementing these interfaces. This helps a lot in structuring your code and have it reuseable for similar XML files.
I'm having big problems with Xpath evaluation using Jaxen.
Here's part of XML i'm evaluating on:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2011-05-31T13:04:08+00:00</responseDate>
<request metadataPrefix="oai_dc" verb="ListRecords">http://citeseerx.ist.psu.edu/oai2</request>
<ListRecords>
<record>
<header>
<identifier>oai:CiteSeerXPSU:10.1.1.1.1484</identifier>
<datestamp>2009-05-24</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Winner-Take-All..</dc:title>
<dc:relation>10.1.1.134.6077</dc:relation>
<dc:relation>10.1.1.65.2144</dc:relation>
<dc:relation>10.1.1.54.7277</dc:relation>
<dc:relation>10.1.1.48.5282</dc:relation>
</oai_dc:dc>
</metadata>
</record>
<resumptionToken>10.1.1.1.2041-1547151-500-oai_dc</resumptionToken>
</ListRecords>
</OAI-PMH>
I'm using Jaxen because in my use case it's much faster then Apache implementation. I'm using W3C DOM for XML representation.
I need to select all record arguments, and then on selected nodes evaluate other xpaths (it's needed because of my processing architecture).
I'm selecting all record nodes (this works):
/OAI-PMH/ListRecords/record
Then on every selected record node I'm evaluating other xpaths to get needed data:
Select identifier text value (this works):
header/identifier/text()
Select title text value (this does NOT work):
metadata/oai_dc:dc/dc:title/text()
I've registered namespaces prefixes with their URIs (oai_dc and dc). I also tried other xpaths but none of them work:
metadata/dc/title/text()
metadata//dc:title/text()
I've read other stackoverflow questions about xpaths, namespaces and solution to add prefix "oai" with URI "http://www.openarchives.org/OAI/2.0/". I tried adding that "oai:" prefix to nodes without defined prefix but as result I even didn't select record nodes. Any ideas what I'm doing wrong?
Solution:
Problem was about parser (thanks jasso). It wasn't set to be namespace aware - after changing that setting everything works fine, as expected.
I can't see how the XPath expression /OAI-PMH/ListRecords/record can possibly select anything, since your document does not have a {}OAI-PMH element, only a {http://www.openarchives.org/OAI/2.0/}OAI-PMH element. See http://jaxen.codehaus.org/faq.html
I've read XPath - how to select text and thought I had the general idea. But, as always, XPath rears up, hisses at me, and scuttles off to find the nearest bacteria-infested urinal to drown in.
I have a JPA orm.xml file. It looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<entity-mappings xmlns="http://java.sun.com/xml/ns/persistence/orm"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm
http://java.sun.com/xml/ns/persistence/orm_2_0.xsd"
version="2.0">
<persistence-unit-metadata>
<persistence-unit-defaults>
<schema>test</schema>
<catalog>test</catalog>
</persistence-unit-defaults>
</persistence-unit-metadata>
</entity-mappings>
The following XPath expression should, I would think, select the text from the <schema> element:
/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()
But using Java's XPath implementation, it does not.
More specifically, the following code fails (using JUnit asserts) on the last line. The value of the text variable is the empty string.
// Find the file: URL to the orm.xml I mentioned above.
final URL ormUrl = Thread.currentThread().getContextClassLoader().getResource("META-INF/orm.xml");
assertNotNull(ormUrl);
final XPathFactory xpf = XPathFactory.newInstance();
assertNotNull(xpf);
final XPath xpath = xpf.newXPath();
assertNotNull(xpath);
final XPathExpression expression = xpath.compile("/entity-mappings/persistence-unit-metadata/persistence-unit-defaults/schema/text()");
assertNotNull(expression);
final String text = expression.evaluate(new InputSource(ormUrl.openStream()));
assertEquals("test", text);
This seems to cast into doubt what little understanding I had of XPath expressions to begin with. Flailing around, I then wanted to see if a simple "/" would select the root element. Mercifully, this returned a non-null NodeList, but the NodeList was empty. I really don't want to hunt the authors of the Java XPath support down and string them up, but it's getting awfully difficult not to follow that course of action.
Please help me shoot XPath in the head once and for all. Thanks.
The problem is that the XML declares a default namespace
xmlns="http://java.sun.com/xml/ns/persistence/orm"
while in your XPath expression you have not provided a corresponding namespace context. See this link for details on how to work with namespace contexts. There's a lot of detail there, but in summary you have to write your own implementation of javax.xml.namespace.NamespaceContext that allows the XPath processor to map namespace prefixes to URIs. In your case you must provide a mapping for the default namespace to the appropriate URI.
I'm currently using JDOM for doing some simple XML parsing, and it seems like nothing's type safe - I had a similar issue with using the built-in Java DOM parser, just with lots more API to wade through.
For example, XPath.selectNodes takes an Object as its argument and returns a raw list, which just feels a little Java 1.1
Are there generic-ized XML and XPath libraries for Java, or is there some reason why it's just not possible to do XPath queries in a type-safe way?
If you're familiar with CSS selectors on HTML, it may be good to know that Jsoup supports XML as well.
Update: OK, that was given the downvote apparently a very controversial answer. It may however end up to be easier and less verbose than Xpath when all you want is to select node values. The Jsoup API is namely very slick. Let's give a bit more concrete example. Assuming that you have a XML file which look like this:
<?xml version="1.0" encoding="UTF-8"?>
<persons>
<person id="1">
<name>John Doe</name>
<age>30</age>
<address>
<street>Main street 1</street>
<city>Los Angeles</city>
</address>
</person>
<person id="2">
<name>Jane Doe</name>
<age>40</age>
<address>
<street>Park Avenue 1</street>
<city>New York</city>
</address>
</person>
</persons>
Then you can traverse it like follows:
Document document = Jsoup.parse(new File("/persons.xml"), "UTF-8");
Element person2 = document.select("person[id=2]").first();
System.out.println(person2.select("name").text());
Elements streets = document.select("street");
for (Element street : streets) {
System.out.println(street.text());
}
which outputs
Jane Doe
Main street 1
Park Avenue 1
Update 2: since Jsoup 1.6.2 which was released March 2012, XML parsing is officially supported by the Jsoup API.
AFAIK all of the xml queries in java are non-typesafe and most are java 1.3 compatible. That said my favorite parser/generator is the xml pull parser (xmlpp) style parser. I believe java has XmlStreamReader and XmlStreamWriter if you're using 1.6 which are almost the same as the xmlpp library. I especially like that I can write a method getFoo that takes a stream reader and pulls from it and returns a Foo object. It's sort of the best between DOM and SAX. I think it may be referred to as StAX by some.
I'm getting a little ramble-y so I'm quitting now