Using XQuery in Java to extract CData from an xml file

Using XQuery in Java to extract CData from an xml file - java

I've got a kml file(essentially xml) which has a set of nodes; name, description, coordinates etc. Up until now I have only been getting two values; name and coordinates. Now I want to get the description data as well, the only problem is that it is CData and when parsed it is ignored.
I've been using XQuery to get the data so far;
XPathExpression expr = xpath.compile("//name/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for...
In the kml file its just <![CDATA[.....]>as opposed to "101" (an actual string) Using the same query it returns nothing.
The kml file has structure:
<Document>
<Placemark>
<name>101</name>
<description><![CDATA[.....]]></description>
<polygon>
<coordinates>......</coordinates>
</polygon>
</Placemark>
<Placemark>
....
</Placemark>
</Document>
Is there a way to do it through XQuery?

Use the following XPath expression i.e. without specifying text():
XPathExpression expr = xpath.compile("//description");
and read the CDATA content with node.getTextContent()

Related

How to use tokenize function in Xpath

can we use tokenize function in XPath
The general java code i use to process XSLT and XML files are :
XPath xPath = XPathFactory.newInstance().newXPath();
InputSource inputXML = new InputSource(new StringReader(xml));
String expression = "/root/customer/personalDetails[age=tokenize('20|30','|')]/name";
boolean evaluate1 = (boolean) xPath.compile(expression).evaluate(inputXML, XPathConstants.BOOLEAN);
XML :-
<?xml version="1.0" encoding="ISO-8859-15"?>
<root>
<customer>
<personalDetails>
<name>ABC</name>
<value>20</value>
</personalDetails>
<personalDetails>
<name>XYZ</name>
<value>21</value>
</personalDetails>
<personalDetails>
<name>PQR</name>
<value>30</value>
</personalDetails>
</customer>
</root>
Expected Response :- ABC,PQR

Yes, you can use the tokenize() function in XPath, provided your XPath processor supports XPath 2.0 or later.
For Java, the popular choice of XPath 2.0+ processor is Saxon.
You can use the JAXP API with Saxon, however, it's not really designed to work well with XPath 2.0+, so it's preferable to use Saxon's own API (called s9api).
For this particular example, you don't need tokenize(). In XPath 2.0+ you can write
[age=('20', '30')]

XPath query not returning elements after declaring schema [duplicate]

This question already has answers here:
Java XPath: Queries with default namespace xmlns
(2 answers)
Closed 1 year ago.
I try to get element from an XML using XPath in Java.
Without a schema definition / declaration everything works fine as expected:
Example from https://www.w3schools.com/xml/schema_howto.asp
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XPath : /note/heading returns an element
After declaring an xml Schema:
<?xml version="1.0"?>
<note
xmlns="https://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.w3schools.com/xml note.xsd">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XPath /note/heading is not working any more !!
Java example from XPathTutorial
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("inventory.xml");
//Create XPath
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
System.out.println("n//1) Get book titles written after 2001");
XPathExpression expr = xpath.compile("/note/heading/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}

In XPath 1 a step like note selects elements named note in no namespace, you have put the elements into a namespace so your path with "unqualified" names (i.e. without a prefix) doesn't select any input elements. Your Java code needs to bind a prefix (e.g. w3s to the namespace URI (i.e. https://www.w3schools.com) and then use that prefix with e.g. w3s:note.
Or use an XPath 2 or 3 or XQuery implementation where you can declare a default element namespace for path expressions so that your step note would select the elements in that default element namespace you set up as e.g. https://www.w3schools.com.
If you want to ignore the namespace for XPath evaluation, due to the prefix hassle for XPath 1, you might get away with explicitly not using a namespace aware document builder but rather one that is not namespace aware.
With XPath 2 or later or XQuery 1 or later you can also use namespace wildcards like *:note.

XPath to access an attribute value with a name that has special characters

I'm trying to access to attribute value, but the attribute name has special characters, for example:
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<ELEMENT1 at:it="true">W</ELEMENT1>------
<ELEMENT2>IN</ELEMENT2>
<ELEMENT3>RP</ELEMENT3>
<ELEMENT4>KKK</ELEMENT4>
</row>
<row>
<ELEMENT1 acón='sys'>2</ELEMENT1>------
<ELEMENT2>ARQ</ELEMENT2>
<ELEMENT3>MR</ELEMENT3>
<ELEMENT4>AC</ELEMENT4>
</row>
<row>
<ELEMENT1>3</ELEMENT1>
<ELEMENT2>I</ELEMENT2>
<ELEMENT3 at:it="true" >RP</ELEMENT3>------
<ELEMENT4>KKK</ELEMENT4>
</row>
<row>
<ELEMENT1>1</ELEMENT1>
<ELEMENT2>CC</ELEMENT2>
<ELEMENT3>XX</ELEMENT3>
<ELEMENT4 eléct='false' >I</ELEMENT4>------
</row>
<row>
<ELEMENT1>12</ELEMENT1>
<ELEMENT2 at:it="true" >IN</ELEMENT2>------
<ELEMENT3>3</ELEMENT3>
<ELEMENT4></ELEMENT4>
</row>
</root>
if I change the names of the attributes and remove them special characters, I can access them:
at:it ------> atit
Acón ------> Acon
eléctr ------> elect
but attribute names with special characters I can not access them with XPath query expression.
How I can access an XML file to values of attributes with names that have special characters?
To transform the XML file to DOM I used Java6, javax.xml., org.w3c.dom.

I tried it with Java 6 and had no problems to access attributes with accents. The colon is a special case, because it is used to denote element/attribute names with namespace prefixes. The XML doesn't use namespaces otherwise there were a namespace declaration for prefix at.
The XML parser has a switch to treat colons as part of the name but the XPath engine is always namespace aware. But with a little trick it is also possible:
File xmlFile = new File("in.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Parse without namespaces. Otherwise parsing leads to an error
// because there is no namespace declaration for prefix 'at'.
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlFile);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr1 = xpath.compile("/root/row/ELEMENT1/#acón");
//XPathExpression expr2 = xpath.compile("/root/row/ELEMENT1/#at:it"); Doesn't work!
XPathExpression expr2 = xpath.compile("/root/row/ELEMENT1/#*[name() = 'at:it']");
XPathExpression expr3 = xpath.compile("/root/row/ELEMENT4/#eléct");
System.out.println((String) expr1.evaluate(doc, XPathConstants.STRING));
System.out.println((String) expr2.evaluate(doc, XPathConstants.STRING));
System.out.println((String) expr3.evaluate(doc, XPathConstants.STRING));
The output is:
sys
true
false

Realize that a colon (:) should only be used in an element or attribute name if part of namespace prefix:
Note:
The Namespaces in XML Recommendation [XML Names] assigns a meaning to
names containing colon characters. Therefore, authors should not use
the colon in XML names except for namespace purposes, but XML
processors must accept the colon as a name character.
So,
/root/row/ELEMENT1/#at:it
will work to select "true" provided that you change your XML by defining the at namespace prefix in your XML (preferable),
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:at="http://example.com/at">
<row>
<ELEMENT1 at:it="true">W</ELEMENT1>------
<ELEMENT2>IN</ELEMENT2>
<ELEMENT3>RP</ELEMENT3>
<ELEMENT4>KKK</ELEMENT4>
</row>
<row>
<ELEMENT1 acón='sys'>2</ELEMENT1>------
<ELEMENT2>ARQ</ELEMENT2>
<ELEMENT3>MR</ELEMENT3>
<ELEMENT4>AC</ELEMENT4>
</row>
<row>
<ELEMENT1>3</ELEMENT1>
<ELEMENT2>I</ELEMENT2>
<ELEMENT3 at:it="true" >RP</ELEMENT3>------
<ELEMENT4>KKK</ELEMENT4>
</row>
<row>
<ELEMENT1>1</ELEMENT1>
<ELEMENT2>CC</ELEMENT2>
<ELEMENT3>XX</ELEMENT3>
<ELEMENT4 eléct='false' >I</ELEMENT4>------
</row>
<row>
<ELEMENT1>12</ELEMENT1>
<ELEMENT2 at:it="true" >IN</ELEMENT2>------
<ELEMENT3>3</ELEMENT3>
<ELEMENT4></ELEMENT4>
</row>
</root>
or instruct your XML processor to ignore XML namespaces (not a best practice).
The next two cases are fine:
/root/row/ELEMENT1/#acón
will select "sys" without problem provided your XPath processor supports UTF-8 encoding (and it should).
/root/row/ELEMENT4/#eléct
will select "false" similarly.

first get the attributes from your nodes and then check their name.
Something like:
XPath xpath = XPathFactory.newInstance().newXPath(); NodeList nodes = (NodeList) xpath.evaluate(filteringExpression, xmlDocument, XPathConstants.NODESET);
Then iterate through nodes and for each node get its attribute:
Node node = nodes.item(idx); NamedNodeMap nl = node.getAttributes();
Then iterate through attributes and if the name matches ,get its value:
Attr attr = (Attr) nl.item(i); if(attr.getName().equals(...)) String attributeValue = attr.getValue();

Generic xpath for a node at different locations

I have a lot of xml files from different versions of schemas. There are certain sections/tags in these xmls that are the same.
What I want to do is locate a perticular tag and start processing that tag. The thing is that this tag may appear at different locations in the xml.
So I am looking for a xpath that will locate this node irrespective of its location. I am using Java for writing my processing code.
Following are the various falvours of the xmls
Sample 1
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
Sample 2
<a>
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
</a>
Sample 3
<b>
<nodeIWant>
<book>
<title>Harry Potter and the Philosophers Stone</title>
...
</book>
</nodeIWant>
</b>
In the above xmls I want to use the same xpath to locate the node 'nodeIWant'.
The Java code I am using is the following
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document modelDoc = docBuilder.parse(args[0]);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println(xPath.evaluate("//nodeIWant", modelDoc.getDocumentElement(), XPathConstants.NODE));
This prints out a null.
Final Edit
The answer by Mathias Müller works for these xml files. I am actually trying to query the .emx files in Rational Software Architect. I was trying to avaoid using these for examples. (Please don't start talking about BIRT and using the eclipse uml APIs etc... I have tried these and they do not give me what I want.)
The structure of the files is the following
<?xml version="1.0" encoding="UTF-8"?>
<!--xtools2_universal_type_manager-->
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.uml.msl.model" version="7.0.0"><feature description="" name="com.ibm.xtools.ruml.feature" url="" version="7.0.0"/></signature>?>
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.mmi.ui.signatures.diagram" version="7.0.0"><feature description="" name="Rational Modeling Platform (com.ibm.xtools.rmp)" url="" version="7.0.0"/></signature>?>
<xmi:XMI version="2.0" xmlns:Default="http:///schemas/Default/_fNm3AAqoEd6-N_NOT9vsCA/2" xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore" xmlns:uml="http://www.eclipse.org/uml2/3.0.0/UML" xmlns:xmi="http://www.omg.org/XMI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http:///schemas/Default/_fNm3AAqoEd6-N_NOT9vsCA/2 pathmap://UML2_MSL_PROFILES/Default.epx#_fNwoAAqoEd6-N_NOT9vsCA?Default/Default?">
<uml:Model name="A" xmi:id="_4lzSsMywEeGAuoBpYhfj6Q">
<!-- Lot of other stuff -->
</uml:Model>
<xmi:XMI>
The other file is
<?xml version="1.0" encoding="UTF-8"?>
<!--xtools2_universal_type_manager-->
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.uml.msl.model" version="7.0.0"><feature description="" name="com.ibm.xtools.ruml.feature" url="" version="7.0.0"/></signature>?>
<?com.ibm.xtools.emf.core.signature <signature id="com.ibm.xtools.mmi.ui.signatures.diagram" version="7.0.0"><feature description="" name="Rational Modeling Platform (com.ibm.xtools.rmp)" url="" version="7.0.0"/></signature>?>
<uml:Model xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:ecore="http://www.eclipse.org/emf/2002/Ecore" xmlns:uml="http://www.eclipse.org/uml2/3.0.0/UML" xmi:id="_4lzSsMywEeGAuoBpYhfj6Q" name="A">
<!-- Lot of other stuff -->
</uml:Model>
Shouldn't the xpath of '//Model' work for these two samples as well?

You can use the xPath 'axis' //. This searches in the file for your node and doesn't care about the parent-nodes. So in your example you can use:
//nodeIWant

Not very familiar with DocumentBuilder, but perhaps you need to compile an XPath expression before evaluating it against a document? It seems it's not XPath expressions that are evaluated, XML documents are.
String expression = "//nodeIWant";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(modelDoc, XPathConstants.NODESET);
Or, if there is just one of those elements and you'd like to print its string value:
String expression = "//nodeIWant";
System.out.println(xPath.compile(expression).evaluate(modelDoc));
EDIT: You edited your question and revealed the actual XML you are evaluating path expressions against. Those new documents have namespaces that you need to take into account in XPath expressions.
//nodeIWant will never find a node if it is actually in a namespace. To find the Model node in your new documents, you'd have to use
//*[local-name() = 'Model']

Since the example you provided contains more than just a String in the <nodeIWant> elements you probably can benefit from using an object oriented approach combined with xpath. With data projection (disclosure: I'm affiliated with that project) it's possible to do this:
public class DataProjection {
public interface Book {
#XBRead("./title")
String getTitle();
//... more getter or setter methods
}
public static void main(String[] args) {
// Print all books in all <nodeIWant> elements of
for (String file : new String[] { "a.xml", "b.xml", "c.xml" }) {
List<Book> books = new XBProjector().io().file(file).evalXPath("//nodeIWant/book").asListOf(Book.class);
for (Book book : books) {
System.out.println(book.getTitle());
}
}
}
}
You can define one or more views (called projection interfaces) to the XML data and use XPath to connect the data to java objects implementing these interfaces. This helps a lot in structuring your code and have it reuseable for similar XML files.

How to Compare dates with Xpath in android

I want to try parse xml with XPath in Android Application.
My XML file looks like this.
<expenses>
<entry type="fixed">
<amount>200</amount>
<recurring>true</recurring>
<category>Home/Rent</category>
<payee>Ahmet Necati</payee>
<account>1</account>
<startDate>2013-01-01</startDate>
<endDate>2013-01-01</endDate>
</entry>
<entry type="variable">
<amount>150</amount>
<category>Departmental</category>
<payee>Ahmet Necati</payee>
<recurring>true</recurring>
<startDate>2013-01-01</startDate>
<endDate>2013-01-01</endDate>
<account>1</account>
</entry>
</expenses>
and I want to try parse xml with xPath like that
String expression = "/expenses/entry[xs:date(endDate) < xs:date('2013-10-10')]";
NodeList widgetNode = (NodeList) xpath.evaluate(expression, document,
XPathConstants.NODESET);
But I couldnt deal with it. It returns 0 node.
Edit: I want to get all nodes "endDate" less than spesific date for example: I want to get nodes which end Date less than "2013-10-10"

The "XML schema constructor functions" are part of XPath 2.0, but Android only supports XPath 1.0: http://developer.android.com/reference/javax/xml/xpath/package-summary.html
One solution is to register your own function to do the conversion (see XPathFunctionResolver). Another is to look into libraries that support XPath 2.0.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using XQuery in Java to extract CData from an xml file - java

Use the following XPath expression i.e. without specifying text(): XPathExpression expr = xpath.compile("//description"); and read the CDATA content with node.getTextContent()

Related

How to use tokenize function in Xpath

XPath query not returning elements after declaring schema [duplicate]

XPath to access an attribute value with a name that has special characters

Generic xpath for a node at different locations

How to Compare dates with Xpath in android

Categories

Resources