Extracting the node values in XML with XPath in Java

Extracting the node values in XML with XPath in Java - java

I have an XML document:
<response>
<result>
<phone>1233</phone>
<sys_id>asweyu4</sys_id>
<link>rft45fgd</link>
<!-- Many more in result -->
</result>
<!-- Many more result nodes -->
</response>
The XML structure is unknown. I am getting XPath for attributes from user.
e.g. inputs are strings like:
//response/result/sys_id , //response/result/phone
How can I get these node values for whole XML document by evaluating XPath?
I referred this but my xpath is as shown above i.e it does not have * or text() format.
The xpath evaluator works perfectly fine with my input format, so is there any way I can achieve the same in java?
Thank you!

It's difficult without seeing your code... I'd just evaluate as a NodeList and then call getTextContent() on each node in the result list...
String input = "<response><result><phone>1233</phone><sys_id>asweyu4</sys_id><link>rft45fgd</link></result><result><phone>1233</phone><sys_id>another-sysid</sys_id><link>another-link</link></result></response>";
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new ByteArrayInputStream(input.getBytes("UTF-8")));
XPath path = XPathFactory.newInstance().newXPath();
NodeList node = (NodeList) path.compile("//response/result/sys_id").evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < node.getLength(); i++) {
System.out.println(node.item(i).getTextContent());
}
Output
asweyu4
another-sysid

Related

Getting null values from XPath query

I have this xml file:
<?xml version="1.0" encoding="UTF-8"?>
<iet:aw-data xmlns:iet="http://care.aw.com/IET/2007/12" class="com.aw.care.bean.resource.MessageResource">
<iet:metadata filter=""/>
<iet:message-resource>
<iet:message>some message 1</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.11</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
<iet:message-resource>
<iet:message>some message 2</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.12</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
.
.
.
.
</iet:aw-data>
Using this code below i'm getting over the data and finding what I need.
try {
FileInputStream fileIS = new FileInputStream(new File("resources\\bootstrap\\content\\MessageResources_iw_IL\\MessageResource_iw_IL.ctdata.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//*[local-name()='message-resource']//*[local-name()='code'][contains(text(), 'account')]";
NodeList nodeList = (NodeList) xPath.compile(query).evaluate(xmlDocument, XPathConstants.NODESET);
System.out.println("size= " + nodeList.getLength());
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getNodeValue());
}
}
catch (Exception e){
e.printStackTrace();
}
The issue is that i'm getting only null values while printing in the for loop, any idea why it's happened?
The code needs to return a list of nodes which have a code and message fields that contains a given parameters (same as like SQL query with two parameters with operator of AND between them)

Check the documentation:
https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html
getNodeValue() applied to an element node returns null.
Use getTextContent().
Alternatively, if you find DOM too frustrating, switch to one of the better tree models like JDOM2 or XOM.
Also, if you used an XPath 2.0 engine like Saxon, it would (a) simplify your expression to
//*:message-resource//*:code][contains(text(), 'account')]
and (b) allow you to return a sequence of strings from the XPath expression, rather than a sequence of nodes, so you wouldn't have to mess around with nodelists.
Another point: I suspect that the predicate [contains(text(), 'account')] should really be [.='account']. I'm not sure of that, but using text() instead of ".", and using contains() instead of "=", are both common mistakes.

Access data in xml as string

I am receiving a xml in string format. Is there any library to search for elements in the string?
<Version value="0"/>
<IssueDate>2017-12-15</IssueDate>
<Locale>en_US</Locale>
<RecipientAddress>
<Category>Primary</Category>
<SubCategory>0</SubCategory>
<Name>Vitsi</Name>
<Attention>Stowell Group Llc.</Attention>
<AddressLine1>511 6th St</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-2903</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>2903</ZIP4>
</RecipientAddress>
<RecipientAddress>
<Category>Additional</Category>
<SubCategory>1</SubCategory>
<Name>Vitsi</Name>
<AddressLine1>Po Box 957</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-0104</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>0104</ZIP4>
</RecipientAddress>
<SenderName>TMO</SenderName>
<SenderId>IL</SenderId>
<SenderAddress>
<Name>T-mobile</Name>
<AddressLine1>Po Box 790047</AddressLine1>
<City>St. Louis</City>
<PresentationValue>ST. LOUIS MO 63179-0047</PresentationValue>
<State>MO</State>
<ZIPCode>63179</ZIPCode>
.
.
.
.
I want to access the element RecipientAddress, which is a list. Is there any library to do that? Please note that what I receive is a string. It is an invoice and there will be many to process, so performance is important

Following options are available:
Convert xml string to java objects using JAXB.
Use .indexOf() in string method to retrieve specific parts of xml.
Use regular expression to retrieve specific parts of xml.
SAX/DOM/STAX parser for parsing and extraction from xml.
Xpath for fetching the specific values from xml.

You could use XPATH. Java has inbuilt support for XML querying without any thirdparty library,
Code piece would be,
String xmlInputStr = "<YOUR_XML_STRING_INPUT>"
String xpathExpressionStr = "<XPATH_EXPRESSION_STRING>"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlInputStr);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(xpathExpressionStr);
You can write your own expression string for querying. Typical example
"/RecipientAddress/Category"
Evaluate your xml against expression to retrieve list of nodes.
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
And iterate over nodes,
for (int i = 0; i < nodes.getLength(); i++) {
Node nNode = nodes.item(i);
...
}

There lot of pre-implemented api is available to convert xml to java object.
please look at that the xerces from Apache.
If you want extract only specified value the put whole in to string and use indexOf("string")

Issues with xpath context

I have a simple xml document:
<Results>
<Result>
<Number>1</Number>
<Data>a</Data>
</Result>
<Result>
<Number>2</Number>
<Data>b</Data>
</Result>
</Results>
I'm trying to get the data and number of each result using this code:
XPathExpression resExpr = xpath
.compile("//Results/Result");
XPathExpression numExpr=xpath
.compile("//Result/Number");
XPathExpression dataExpr=xpath
.compile("//Result/Data");
NodeList nodeList = (NodeList) resExpr.evaluate(root_node,
XPathConstants.NODESET);
for (int i=0;i<nodeList.getLength();i++) {
Node result=nodeList.item(i);
if (result!=null) {
Node numNode=(Node) numExpr.evaluate(result,
XPathConstants.NODE);
Node dataNode=(Node) dataExpr.evaluate(result,
XPathConstants.NODE);
String data=dataNode.getTextContent());
String num=numNode.getTextContent());
}
}
However, I get 1/a on both iterations. It seem that passing a node doesn't make xpath use it as context, but rather it's looking at whole tree?

This is because your xpath expression start with //. This means start searching from document root any child.
To access child of current node try .// for child in any deep, or ./ for direct child.
Or, because the current not in iteration is Result you can use:
XPathExpression numExpr=xpath
.compile("Number");
XPathExpression dataExpr=xpath
.compile("Data");

JAVA : Extract element from XML Document

I'm using javax.xml.parsers to navigate through an XML document like the one below:
`
<ContextElement>
<DimensionNode>Role</DimensionNode>
<Value>Administration</Value>
<TailoringExpressions>
<TailoringExpression>
<Relation>Student</Relation>
<ProjAtt>
<Attribute>Matr</Attribute>
<Attribute>SName</Attribute>
<Attribute>SSurname</Attribute>
<Attribute>SDateOfBirth</Attribute>
<Attribute>SEmail</Attribute>
<Attribute>SAddress</Attribute>
</ProjAtt>
<Condition/>
<SemiJoinRel/>
<SemiJoinOn/>
<SemiJoinCond/>
</TailoringExpression>
</TailoringExpressions>
</ContextElement>
<ContextElement>
<DimensionNode>Deadline</DimensionNode>
<Value>Lost</Value>
<TailoringExpressions>
<TailoringExpression>
<Relation>Deadline</Relation>
<ProjAtt>
<Attribute>IdDeadline</Attribute>
<Attribute>Student</Attribute>
<Attribute>DeadlineDate</Attribute>
<Attribute>Description</Attribute>
<Attribute>IsMet</Attribute>
</ProjAtt>
<Condition>DeadlineDate LT CurrentDate AND IsMet=False</Condition>
<SemiJoinRel/>
<SemiJoinOn/>
<SemiJoinCond/>
</TailoringExpression>
</TailoringExpressions>
</ContextElement>
`
I've a problem because I need to extract the object/node ContextElement which has as DimensionNode the value "Role" and as Value the value "Administration" and I'm not able to write a working code!
Can someone tell me how to do that?
Thanks

I think the best way to extract values is to use xpath:
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/widgets/widget";
InputSource inputSource = new InputSource("widgets.xml");
NodeSet nodes = (NodeSet) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
For more information look at oracle documentation.

Why does my XPath expression in Java return too many children?

I have the following xml file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<config>
<a>
<b>
<param>p1</param>
<param>p2</param>
</b>
</a>
</config>
and the xpath code to get my node params:
Document doc = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/config/a/b");
Object o = expr.evaluate(doc, XPathConstants.NODESET);
NodeList list = (NodeList) o;
but it turns out that the nodes list (list) has 5 children, including "\t\n", instead of just two. Is there something wrong with my code? How can I just get my two nodes?
Thank you!

When you select /config/a/b/, you are selecting all children of b, which includes three text nodes and two elements. That is, given your XML above and only showing the fragment in question:
<b>
<param>p1</param>
<param>p2</param>
</b>
the first child is the text (whitespace) following <b> and preceding <param>p1 .... The second child is the first param element. The third child is the text (whitespace) between the two param elements. And so on. The whitespace isn't ignored in XML, although many forms of processing XML ignore it.
You have a couple choices:
Change your xpath expression so it will only select element nodes, as suggested by Ted Dziuba, or
Loop over the five nodes returned and only select the non-text nodes.
You could do something like this:
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() != Node.TEXT_NODE) {
System.out.println(nodes.item(i).getNodeValue());
}
}
You can use the node type to select only element nodes, or to remove text nodes.

so the xpath looks like:
/config/a/b/*/text().
And the output for :
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
would be as expected: p1 and p2

How about
/config/a/b/*/text()/..
?

import org.w3c.dom.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import java.io.IOException;
import org.xml.sax.SAXException;
public class TestClient_XPath {
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("yourfile.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xpath.compile("/a/b/c");
Object res = xPathExpression.evaluate(doc);
System.out.println(res.toString());
}
}
Xalan and Xerces appear to be embedded in rt.jar.
Don't include xerces and xalan libs.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4624775

I am not sure but shouldn't /config/a/b just return b? /config/a/b/param should return the two param nodes...
Could the view on the problem be the problem? Of course you get back the resulting node AND all its children. So you just have to look at the first element and not at its children.
But I can be totally wrong, because I am usually just use Xpath to navigate on DOM trees (HtmlUnit).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extracting the node values in XML with XPath in Java - java

Related

Getting null values from XPath query

Access data in xml as string

Issues with xpath context

JAVA : Extract element from XML Document

Why does my XPath expression in Java return too many children?

Categories

Resources