Parsing XML Node in Java using XPATH

Parsing XML Node in Java using XPATH - java

I just started to try Jaxp13XPathTemplate but I'm a bit confused on parsing the XML.
Here is the sample XML
<fxDataSets>
<fxDataSet name="NAME_A">
<link rel="self" href="http://localhost:8080/linkA"/>
<baseCurrency>EUR</baseCurrency>
<description>TEST DESCRIPTION A</description>
</fxDataSet>
<fxDataSet name="NAME_B">
<link rel="self" href="http://localhost:8080/linkB"/>
<baseCurrency>EUR</baseCurrency>
<description>TEST DESCRIPTION B</description>
</fxDataSet>
<fxDataSets>
I'm already able to get NAME_A and NAME_B however I'm not able to get the description for both Node.
Here is what I have come up with.
XPathOperations xpathTemplate = new Jaxp13XPathTemplate();
String fxRateURL = "http://localhost:8080/rate/datasets";
RestTemplate restTemplate = new RestTemplate();
Source fxRate = restTemplate.getForObject(fxRateURL,Source.class);
List<Map<String, Object>> currencyList = xpathTemplate.evaluate("//fxDataSet", fxRate , new NodeMapper() {
public Object mapNode(Node node, int i) throws DOMException
{
Map<String, Object> singleFXMap = new HashMap<String, Object>();
Element fxDataSet = (Element) node;
String id = fxDataSet.getAttribute("name");
/* This part is not working
if(fxDataSet.hasChildNodes())
{
NodeList nodeList = fxDataSet.getChildNodes();
int length = nodeList.getLength();
for(int index=0;i<length;i++)
{
Node childNode = nodeList.item(index);
System.out.println("childNode name"+childNode.getLocalName()+":"+childNode.getNodeValue());
}
}*/
return new Object();
}
});

try to use dom4j library and it's saxReader.
InputStream is = FileUtils.class.getResourceAsStream("file.xml");
SAXReader reader = new SAXReader();
org.dom4j.Document doc = reader.read(is);
is.close();
Element content = doc.getRootElement(); //this will return the root element in your xml file
List<Element> methodEls = content.elements("element"); // this will retun List of all Elements with name "element"

Take a look public <T> List<T> evaluate(String expression, Source context, NodeMapper<T> nodeMapper)
evaluate takes NodeMapper<T> as one of its parameter
it returns object of type List<T>
But for your given code snippet:
its passing new NodeMapper() as parameter
but trying to return List<Map<String, Object>> which is surely violation of the contract of the api.
Probable solution:
I am assuming you wanna return a object of type FxDataSet which wraps <fxDataSet>...</fxDataSet> element. If this is the case,
pass parameter as new NodeMapper<FxDataSet>() as parameter
use List<FxDataSet> currencyList = ... as left hand side expression;
change method return type as public FxDataSet mapNode(Node node, int i) throws DOMException.
Take a look at the documentation also for NodeMapper.
Surely, I have not used Jaxp13XPathTemplate, but this should be common Java concept which helped me to find out what was wrong actually. I wish this solution will work.

If you want to get at the child nodes of the fxDataSet element you should be able to do:
Node descriptionNode= fxDataSet.getElementsByTagName("description").item(0);

Related

Match set of simple xpaths with SAX

I have a set of simple xpaths involving only tags and attributes, no predicates. My XML input has a size of several MB so I want to use a streaming XML parser.
How can I match the streaming XML parser against the set of xapths to retrieve one value for each xpath?
The crux seems to build the right data structure from the set of xpaths so it can be evaluated based on the xml events.
This seems like a fairly common task but I couldn't find any readily available solutions.

To match a streaming XML parser against a set of simple xpaths, you can use the following steps:
Create a Map<String, String> to store the xpaths and their corresponding values. Initialize the values to null.
Create a Stack<String> to keep track of the current path of the XML elements.
Create a SAXParser and a DefaultHandler to parse the XML input.
In the startElement method of the handler, push the element name to the stack and append it to the current path. Then, check if the current path matches any of the xpaths in the map. If yes, set a flag to indicate that the value should be extracted.
In the endElement method of the handler, pop the element name from the stack and remove it from the current path. Then, reset the flag to indicate that the value should not be extracted.
In the characters method of the handler, check if the flag is set. If yes, append the character data to the value of the matching xpath in the map.
After parsing the XML input, return the map with the xpaths and their values.
Explanation
A streaming XML parser, such as SAXParser, reads the XML input sequentially and triggers events when it encounters different parts of the document, such as start tags, end tags, text, etc. It does not build a tree structure of the document in memory, which makes it more efficient for large XML inputs.
An xpath is a syntax for selecting nodes from an XML document. It consists of a series of steps, separated by slashes, that describe the location of the desired node. For example, /bookstore/book/title selects the title element of the book element of the bookstore element.
A simple xpath involves only tags and attributes, no predicates. For example, /bookstore/book[#lang='en']/title selects the title element of the book element that has an attribute lang with value en.
To match a streaming XML parser against a set of simple xpaths, we need to keep track of the current path of the XML elements as we parse the input, and compare it with the xpaths in the set. If we find a match, we need to extract the value of the node and store it in a map. We also need to handle the cases where the node value spans across multiple character events, or where the node has multiple occurrences in the document.
Example
Suppose we have the following XML input:
<bookstore>
<book lang="en">
<title>Harry Potter and the Philosopher's Stone</title>
<author>J. K. Rowling</author>
<price>10.99</price>
</book>
<book lang="fr">
<title>Le Petit Prince</title>
<author>Antoine de Saint-Exupéry</author>
<price>8.50</price>
</book>
</bookstore>
And the following set of simple xpaths:
/bookstore/book/title
/bookstore/book/author
/bookstore/book[#lang='fr']/price
We can use the following Java code to match the streaming XML parser against the set of xpaths:
import java.io.*;
import java.util.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class XPathMatcher {
public static Map<String, String> match(InputStream xmlInput, Set<String> xpaths) throws Exception {
// Create a map to store the xpaths and their values
Map<String, String> map = new HashMap<>();
for (String xpath : xpaths) {
map.put(xpath, null);
}
// Create a stack to keep track of the current path
Stack<String> stack = new Stack<>();
// Create a SAXParser and a DefaultHandler to parse the XML input
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
// A flag to indicate if the value should be extracted
boolean extract = false;
// A variable to store the current path
String currentPath = "";
// A variable to store the matching xpath
String matchingXPath = "";
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
// Push the element name to the stack and append it to the current path
stack.push(qName);
currentPath += "/" + qName;
// Check if the current path matches any of the xpaths in the map
for (String xpath : map.keySet()) {
// If the xpath has an attribute, extract the attribute name and value
String attrName = "";
String attrValue = "";
if (xpath.contains("[#")) {
int start = xpath.indexOf("[#") + 2;
int end = xpath.indexOf("=");
attrName = xpath.substring(start, end);
start = end + 2;
end = xpath.indexOf("]");
attrValue = xpath.substring(start, end - 1);
}
// If the xpath matches the current path, and either has no attribute or has a matching attribute, set the flag and the matching xpath
if (xpath.startsWith(currentPath) && (attrName.isEmpty() || attrValue.equals(attributes.getValue(attrName)))) {
extract = true;
matchingXPath = xpath;
break;
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
// Pop the element name from the stack and remove it from the current path
stack.pop();
currentPath = currentPath.substring(0, currentPath.length() - qName.length() - 1);
// Reset the flag and the matching xpath
extract = false;
matchingXPath = "";
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
// Check if the flag is set
if (extract) {
// Append the character data to the value of the matching xpath in the map
String value = map.get(matchingXPath);
if (value == null) {
value = "";
}
value += new String(ch, start, length);
map.put(matchingXPath, value);
}
}
};
// Parse the XML input
parser.parse(xmlInput, handler);
// Return the map with the xpaths and their values
return map;
}
public static void main(String[] args) throws Exception {
// Create an input stream from the XML file
InputStream xmlInput = new FileInputStream("bookstore.xml");
// Create a set of simple xpaths
Set<String> xpaths = new HashSet<>();
xpaths.add("/bookstore/book/title");
xpaths.add("/bookstore/book/author");
xpaths.add("/bookstore/book[#lang='fr']/price");
// Match the streaming XML parser against the set of xpaths
Map<String, String> map = match(xmlInput, xpaths);
// Print the results
for (String xpath : map.keySet()) {
System.out.println(xpath + " = " + map.get(xpath));
}
}
}
The output of the code is:
/bookstore/book/title = Harry Potter and the Philosopher's StoneLe Petit Prince
/bookstore/book/author = J. K. RowlingAntoine de Saint-Exupéry
/bookstore/book[#lang='fr']/price = 8.50

DOM4J Parse not returning any child nodes

I am attempting to begin writing a program which uses DOM4j with which I wish to parse a XML file, save it to some tables and finally allow the user to manipulate the data.
Unfortunately I am stuck on the most basic step, the parsing.
Here is the portion of my XML I am attempting to include:
<?xml version="1.0"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.04">
<BkToCstmrDbtCdtNtfctn>
<GrpHdr>
<MsgId>000022222</MsgId>
When I attempt to find the root of my XML it does return the root correctly as "Document". When I attempt to get the child node from Document it also correctly gives me "BkToCstmrDbtCdtNtfctn". The problem is that when I try to go any further and get the child nodes from "Bk" I can't. I get this in the console:
org.dom4j.tree.DefaultElement#2b05039f [Element: <BkToCstmrDbtCdtNtfctn uri: urn:iso:std:iso:20022:tech:xsd:camt.054.001.04 attributes: []/>]
Here is my code, I would appreciate any feedback. Ultimately I want to get the "MsgId" attribute back but in general I just want to figure how to parse deeper into the XML because in reality it probably has about 25 layers.
public static Document getDocument(final String xmlFileName){
Document document = null;
SAXReader reader = new SAXReader();
try{
document = reader.read(xmlFileName);
}
catch (DocumentException e)
{
e.printStackTrace();
}
return document;
}
public static void main(String args[]){
String xmlFileName = "C:\\Users\\jhamric\\Desktop\\Camt54.xml";
String xPath = "//Document";
Document document = getDocument(xmlFileName);
Element root = document.getRootElement();
List<Node> nodes = document.selectNodes(xPath);
for(Iterator i = root.elementIterator(); i.hasNext();){
Element element = (Element) i.next();
System.out.println(element);
}
for(Iterator i = root.elementIterator("BkToCstmrDbtCdtNtfctn");i.hasNext();){
Element bk = (Element) i.next();
System.out.println(bk);
}
}
}

The best approach is probably to use XPath, but since the XML document uses namespaces, you cannot use the "simple" selectNodes methods in the API. I would create a helper method to easily evaluate any XPath expression on either the Document or the Element level:
public static void main(String[] args) throws Exception {
Document doc = getDocument(...);
Map<String, String> namespaceContext = new HashMap<>();
namespaceContext.put("ns", "urn:iso:std:iso:20022:tech:xsd:camt.054.001.04");
// Select the first GrpHdr element in document order
Element element = (Element) select("//ns:GrpHdr[1]", doc, namespaceContext);
System.out.println(element.asXML());
// Select the text content of the MsgId element
Text msgId = (Text) select("./ns:MsgId/text()", element, namespaceContext);
System.out.println(msgId.getText());
}
static Object select(String expression, Branch contextNode, Map<String, String> namespaceContext) {
XPath xp = contextNode.createXPath(expression);
xp.setNamespaceURIs(namespaceContext);
return xp.evaluate(contextNode);
}
Note that the XPath expression must use namespace prefixes that is mapped to the namespace URIs used in the input document, but that the actual value of the prefix doesn't matter.

how to retrieve XML data using XPath which has a default namespace in Java?

I've come across and problem that I've looked up on stack overflow but none of the solutions seems to solve the problem for me.
I'm retrieving XML data from Yahoo and it comes back as below (truncated for brevity's sake).
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<fantasy_content xmlns="http://fantasysports.yahooapis.com/fantasy/v2/base.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" copyright="Data provided by Yahoo! and STATS, LLC" refresh_rate="31" time="55.814027786255ms" xml:lang="en-US" yahoo:uri="http://fantasysports.yahooapis.com/fantasy/v2/league/328.l.108462/settings">
<league>
<league_key>328.l.108462</league_key>
<league_id>108462</league_id>
<draft_status>postdraft</draft_status>
</league>
</fantasy_content>
I've been having a problem getting XPath to retrieve any elements so I've written a unit test to try to resolve it and it looks like:
final File file = new File("league-settings.xml");
javax.xml.parsers.DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
javax.xml.parsers.DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
org.w3c.dom.Document doc = dBuilder.parse(file);
javax.xml.xpath.XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new YahooNamespaceContext());
final String expression = "yfs:league";
final XPathExpression expr = xPath.compile(expression);
Object nodes = expr.evaluate(doc, XPathConstants.NODESET);
assert(nodes instanceof NodeList);
NodeList leagueNodes = (NodeList)nodes;
int leaguesLength = leagueNodes.getLength();
assertEquals(leaguesLength, 1);
The YahooNamespaceContext class I created to map the namespaces looks as follows:
public class YahooNamespaceContext implements NamespaceContext {
public static final String YAHOO_NS = "http://www.yahooapis.com/v1/base.rng";
public static final String DEFAULT_NS = "http://fantasysports.yahooapis.com/fantasy/v2/base.rng";
public static final String YAHOO_PREFIX = "yahoo";
public static final String DEFAULT_PREFIX = "yfs";
private final Map<String, String> namespaceMap = new HashMap<String, String>();
public YahooNamespaceContext() {
namespaceMap.put(DEFAULT_PREFIX, DEFAULT_NS);
namespaceMap.put(YAHOO_PREFIX, YAHOO_NS);
}
public String getNamespaceURI(String prefix) {
return namespaceMap.get(prefix);
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator<String> getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
}
Any help with people with more experience with XML namespaces or debugging tips into Xpath compilation/evaluation would be appreciated.

If the problem is that you're getting zero as the length of the result nodelist, have you tried changing
final String expression = "yfs:league";
to
final String expression = "//yfs:league";
?
It appears that the context for evaluating your XPath expressions, doc, is the root node of the document. dBuilder.parse(file) returns the document root node, not the outermost element (a.k.a. document element). Remember, in XPath, a root node is not an element. So doc
is not the yfs:fantasy_content element node but is its (invisible) parent.
In that context, the XPath expression "yfs:league" will only select an element that is a direct child of that root node, of which there is no yfs:league -- only yfs:fantasy_content.

The XPath expression yfs:league is equivalent to child::yfs:league. It means: find direct children nodes (not descendants) of doc with the specified local name (league) and namespace URI (http://fantasysports.yahooapis.com/fantasy/v2/base.rng).
You must take into account the outermost element (fantasy_content) or search for descendant instead of child nodes.
Replacing
final String expression = "yfs:league";
with
final String expression = "yfs:fantasy_content/yfs:league";
or with
final String expression = "//yfs:league";
will solve the problem.

how to Print date and id in given XML

This is my xml format:
<taxmann>
<docdetails>
<info id="104010000000006516" date="20120120">
<physicalpath>\\192.168.1.102\CMS\DATA</physicalpath>
<filepath isxml="N">\CIRCULARS\DIRECTTAXLAWS\HTMLFILES\CIRDGBACDD4836150012011122012012.htm</filepath>
<summary></summary>
<description></description>
<heading>DGBA.CDD. NO.H- 4836 /15.02.001/2011-12 | Clarification on Regulation of Interest Rates for Small Savings Schemes</heading>
<correspondingcitation/>
<hasfile>YES</hasfile>
<sortby>20120328155728957</sortby>
<parentid></parentid>
<parentchapterid></parentchapterid>
</info>
</docdetails>
</taxmann>
I'm able to retrieve data of heading but I want to print date and id too but I'm not able to do this. Please tell me how to implement it.
XMLParser parser = new XMLParser();
String xml = parser.getXmlFromUrl(url); // getting XML
Document doc = parser.getDomElement(xml); // getting DOM element
NodeList nl = doc.getElementsByTagName(KEY_ITEM);
ArrayList<HashMap<String, String>> menuItems = new ArrayList<HashMap<String, String>>();
HashMap<String, String> map;
for (int i = indexRowStart; i < indexRowEnd; i++) {
Element e = (Element) nl.item(i);
// adding each child node to HashMap key => value
map = new HashMap<String, String>();
map.put("RowID", String.valueOf(RowID));
String Heading= parser.getValue(e, KEY_NAME).replace("|", "|\n").replace("|", "");
map.put(KEY_NAME,Heading);
// adding HashList to ArrayList
menuItems.add(map);
}
This is my code please tell me the logic how I can parse, so that I can get date and id too.

Are you sure this is the easiest way to read that xml file? It just looks a bit too complicated. Why don't you navigate manually through the tree structure?
I would say you would get it this way:
SAXBuilder builder = new SAXBuilder();
Document doc;
doc = builder.build(file);
//rootElement would be your "taxmann" element
Element rootElement = doc.getRootElement();
Element docdetailsElement = rootElement.getChild("docdetails");
Element infoElement = docdetailsElement.getChild("info");
String id = infoElement.getAttributeValue("id");
String date = infoElement.getAttributeValue("date");

You should use Element#getAttribute(String name) method. In your case something like:
String id=e.getAttribute("id");
String date=e.getAttribute("date");

using xpath in java to go through this list?

i have an xml file that contains lots of different nodes. some in particularly are nested like this:
<emailAddresses>
<emailAddress>
<value>sambj1981#gmail.com</value>
<typeSource>WORK</typeSource>
<typeUser></typeUser>
<primary>false</primary>
</emailAddress>
<emailAddress>
<value>sambj#hotmail.co.uk</value>
<typeSource>HOME</typeSource>
<typeUser></typeUser>
<primary>true</primary>
</emailAddress>
</emailAddresses>
From the above node, what i want to do is go through each and get the values inside it(value, typeSource, typeUser etc) and put them in a POJO.
i tried to see if i can use this xpath expression "//emailAddress" but it doesnt return me the tags inside inside it. maybe i am doing it wrong. i am pretty new to using xpath.
i could do something like this:
//emailAddress/value | //emailAddress/typeSource | .. but doing that will list all elements values together if im not mistaken leaving me to work out when i have finished reading from a specific emailAddress tag and going to the next emailAddress tag.
well to sum up my needs i basically want this to be returned similar to how you would return results from a bog standard sql query that returns results in a row. i.e. if your sql query produces 10 emailAddress, it will return each emailAddress in a row and i can simply iterate over "each emailAddress" and get the appropriate value based on the colunm name or index.

No,
//emailAddress
doesn't return the tags inside, that is correct. What it does return is a NodeList/NodeSet. To actually get the values you can do something like this:
String emailpath = "//emailAddress";
String emailvalue = ".//value";
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
Document document;
public XpathStuff(String file) throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = docFactory.newDocumentBuilder();
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
document = builder.parse(bis);
NodeList nodeList = getNodeList(document, emailpath);
for(int i = 0; i < nodeList.getLength(); i++){
System.out.println(getValue(nodeList.item(i), emailvalue));
}
bis.close();
}
public NodeList getNodeList(Document doc, String expr) {
try {
XPathExpression pathExpr = xpath.compile(expr);
return (NodeList) pathExpr.evaluate(doc, XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return null;
}
//extracts the String value for the given expression
private String getValue(Node n, String expr) {
try {
XPathExpression pathExpr = xpath.compile(expr);
return (String) pathExpr.evaluate(n,
XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return null;
}
Maybe I should point out that when iterating over the Nodelist, in .//values the first dot means the current context. Without the dot you would get the first node all the time.

//emailAddress/*
will get these nodes in the document order.
It depends on how you want to iterate through the nodes. We do all our XML using XOM (http://www.xom.nu/) which is an easy reliable Java package. It's possible to write your own strategy using XOM calls.

If you use XStream you can set it up quite easily. Like so:
#XStreamAlias( "EmailAddress" )
public class EmailAddress {
#XStreamAlias()
private String value;
#XStreamAlias()
private String typeSource;
#XStreamAlias()
private String typeUser;
#XStreamAlias()
private boolean primary;
// ... the rest omitted for brevity
}
You then marshal & unmarshal quite simply like so:
XStream xstream = new XStream();
xstream.processAnnotations( EmailAddress.class );
xstream.toXML( /* Object value here */ emailAddress );
xstream.fromXML( /* String xml value here */ "" );
IDK if you have to use XPath or not, but if not I'd consider an out of the box solution like this.

I am totally aware this is not what you were asking for, but may consider using jibx. This is a tool for human-readable XML to POJO mapping.
So I believe you could generate mapping for your email structure in a quick way and let the jibx do the work for you.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing XML Node in Java using XPATH - java

If you want to get at the child nodes of the fxDataSet element you should be able to do: Node descriptionNode= fxDataSet.getElementsByTagName("description").item(0);

Related

Match set of simple xpaths with SAX

DOM4J Parse not returning any child nodes

how to retrieve XML data using XPath which has a default namespace in Java?

how to Print date and id in given XML

using xpath in java to go through this list?

Categories

Resources