I'm trying to parse the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<docusign-cfg>
<tagConfig>
<tags>
<approve>approve</approve>
<checkbox>checkbox</checkbox>
<company>company</company>
<date>date</date>
<decline>decline</decline>
<email>email</email>
<emailAddress>emailAddress</emailAddress>
<envelopeID>envelopeID</envelopeID>
<firstName>firstName</firstName>
<lastName>lastName</lastName>
<number>number</number>
<ssn>ssn</ssn>
<zip>zip</zip>
<signHere>signHere</signHere>
<checkbox>checkbox</checkbox>
<initialHere>initialHere</initialHere>
<dateSigned>dateSigned</dateSigned>
<fullName>fullName</fullName>
</tags>
</tagConfig>
</docusign-cfg>
I want to read either the name or content of each tag in the <tags> tag. I can do so with the following code:
public String[] getAvailableTags() throws Exception
{
String path = "/docusign-cfg/tagConfig/tags";
XPathFactory f = XPathFactory.newInstance();
XPath x = f.newXPath();
Object result = null;
try
{
XPathExpression expr = x.compile(path);
result = expr.evaluate(doc, XPathConstants.NODE);
}
catch (XPathExpressionException e)
{
throw new Exception("An error ocurred while trying to retrieve the tags");
}
Node node = (Node) result;
NodeList childNodes = node.getChildNodes();
String[] tags = new String[childNodes.getLength()];
System.out.println(tags.length);
for(int i = 0; i < tags.length; i++)
{
String content = childNodes.item(i).getNodeName().trim().replaceAll("\\s", "");
if(childNodes.item(i).getNodeType() == Node.ELEMENT_NODE &&
childNodes.item(i).getNodeName() != null)
{
tags[i] = content;
}
}
return tags;
}
After some searching I found that parsing it this way causes it to read whitespace between nodes / tags causes those whitespaces to be read as children. In this case the whitespaces are considered children of <tags> .
My output:
37
null
approve
null
checkbox
null
company
null
date
null
decline
null
email
null
emailAddress
null
envelopeID
null
firstName
null
lastName
null
number
null
ssn
null
zip
null
signHere
null
checkbox
null
initialHere
null
dateSigned
null
fullName
null
37 is the number of nodes it found in <tags>
Everything below 37 is the content of the tag array.
How are these null elements being added to the tag array despite my checking for null?
I think that is because of the indexing of tag. The if check also skips an index. So even though value is not being inserted it will result in null. Use separate index for tag array
int j = 0;
for(int i = 0; i < tags.length; i++)
{
String content = childNodes.item(i).getNodeName().trim().replaceAll("\\s", "");
if(childNodes.item(i).getNodeType() == Node.ELEMENT_NODE &&
childNodes.item(i).getNodeName() != null)
{
tags[j++] = content;
}
}
Since you are omitting some of the child nodes, creating an array of entire child nodes length may result in wastage of memory. You can use a List instead. If you are particular about String array you can later convert this to an array as well.
public String[] getAvailableTags() throws Exception
{
String path = "/docusign-cfg/tagConfig/tags";
XPathFactory f = XPathFactory.newInstance();
XPath x = f.newXPath();
Object result = null;
try
{
XPathExpression expr = x.compile(path);
result = expr.evaluate(doc, XPathConstants.NODE);
}
catch (XPathExpressionException e)
{
throw new Exception("An error ocurred while trying to retrieve the tags");
}
Node node = (Node) result;
NodeList childNodes = node.getChildNodes();
List<String> tags = new ArrayList<String>();
for(int i = 0; i < tags.length; i++)
{
String content = childNodes.item(i).getNodeName().trim().replaceAll("\\s", "");
if(childNodes.item(i).getNodeType() == Node.ELEMENT_NODE &&
childNodes.item(i).getNodeName() != null)
{
tags.add(content);
}
}
String[] tagsArray = tags.toArray(new String[tags.size()]);
return tagsArray;
}
The contents of tag array defaults to null.
So it is not a case of how does the element become null, it is the case of it being left as null.
To prove this to yourself, add the following else block like this:
if(childNodes.item(i).getNodeType() == Node.ELEMENT_NODE &&
childNodes.item(i).getNodeName() != null)
{
tags[i] = content;
} else {
tags[i] = "Foo Bar";
}
You should now see 'Foo Bar' instead of null.
A better solution here would be to use an ArrayList, and append the tags to it instead of using an array. Then you do not need to track the indexes and so less chance of this type of bug.
Related
I'm currently working for the first time with API's and are having some trouble retrieving data.
The xml file looks like this:
<schedule>
...
<scheduledepisode>
<episodeid>22441</episodeid>
<title>Ekonyheter </title>
<starttimeutc>2012-09-19T04:00:00Z</starttimeutc>
<endtimeutc>2012-09-19T04:03:00Z</endtimeutc>
<program id="83" name="Ekot" />
<channel id="164" name="P3" />
</scheduledepisode>
<scheduledepisode>
So i used NodeList nodelist1 = doc.getElementsByTagName("scheduledepisode"); to get all the scheduledepisode elements, then I thought that to revive the data under title I could simple use the following:
System.out.println(node.getAttributes().getNamedItem("title").getTextContent());
However this only returns null and i cant understand why, can someone explain what I am missing here. To my understanding the title element is a attribute to the scheduledepisode element. Is that wrong?
The length of the nodelist is correct contra the amount of scheduledepisodes so I'm assuming that I have gotten the correct elements.
The code looks like this:
NodeList nodelist1 = doc.getElementsByTagName("scheduledepisode");
for (int i = 0; i < nodelist1.getLength(); i++)
{
Node node = nodelist1.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE )
{
if (node.getAttributes().getNamedItem("title") != null) {
System.out.println(node.getAttributes().getNamedItem("title").getTextContent());
}
}
}
Since <title> is an element and not attribute of <scheduledepisode>, getAttributes() would not work. Therefore, use getElementsByTagName again:
NodeList se_nodelist = doc.getElementsByTagName("scheduledepisode");
for (int i = 0; i < nodelist1.getLength(); i++)
{
Node node = nodelist1.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE )
{
if (node.getElementsByTagName("title") != null) {
System.out.println(node.getElementsByTagName("title").item(0).getTextContent());
}
}
}
I'm parsing large XMLs using Java 8 and XmlPath 1.0. I want to extract, name of the Test, his measured values and the outcome (Passed or Failed).
Each Test can have many TestResult which contains one of the two types of limits:
SingleLimit, which will have only one < Limit comparator="XYZ">
LimitPair which will have always two limits
<tr:Test ID = "282" name = "DIP1-8 High">
<tr:Extension>
<ts:TSStepProperties>
...
</ts:TSStepProperties>
</tr:Extension>
<tr:Outcome value = "Passed" /> <!-- value -->
<tr:TestResult ID = "xyz" name = "TC 8.7.4 - Current breaker output J10:1-2"> <!-- name -->
<tr:TestLimits>
<tr:Limits>
<c:LimitPair operator = "AND">
<c:Limit comparator = "GE">
<!-- value -->
<c:Datum nonStandardUnit = "V" value = "2.8" xsi:type="ts:TS_double" flags = "0x0000"/>
</c:Limit>
<c:Limit comparator = "LE">
<!-- value -->
<c:Datum nonStandardUnit = "V" value = "3.5" xsi:type="ts:TS_double" flags = "0x0000"/>
</c:Limit>
</c:LimitPair>
</tr:Limits>
</tr:TestLimits>
</tr:TestResult>
</tr:Test>
Currently I'm using these paths to extract PairLimit measurements and to create String containing values.
My question is how I should write code/xPaths to take care of possible many TestResults inside one Test.
I assumed at the beginning that Test can have only PairLimit or SingleLimit, which was wrong.
My current code extract all values correctly, but assigned measurements are incorrect when there are many TestResults inside Test.
For instance, if Test ID = 1 contains 3 (three) TestResults then in the final String containing measurements, I will have values from first Test inside second, because it will "override" the values.
private ArrayList<String> preparePairLimitPaths() {
final ArrayList<String> list = new ArrayList<>();
list.add("//Test[TestResult//LimitPair]/#name");
list.add("//Test/TestResult[TestLimits//LimitPair]/TestData/Datum/#value");
list.add("//Test/TestResult/TestLimits/Limits/LimitPair/Limit[*]/Datum/#value");
list.add("//Test/TestResult/TestLimits/Limits/LimitPair/Limit[*]/Datum/#value");
list.add("//Test[TestResult//TestLimits//LimitPair]/Outcome/#value");
return list;
}
for (String expr : preparePairLimitPaths) {
try {
final NodeList evaluate = (NodeList) xPath.evaluate(expr, parse, XPathConstants.NODESET);
for (int i = 0; i < evaluate.getLength(); i++) {
final String textContent = evaluate.item(i).getTextContent();
if (textContent != null && !textContent.isEmpty()) {
stringBuilder.append(textContent).append(";");
}
}
stringBuilder.append("###");
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
You can just iterate over each Test and then iterate over each TestResult and then put the logic with TestLimits etc.
NodeList allTests = (NodeList) xPath.evaluate("/xml/Test", xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < tests.getLength(); i++) {
Element singleTest = (Element) tests.item(i);
// Here, you can extract some values from your test like:
// testOutcome = xPath.evaluate("Outcome/#value", singleTest);
NodeList testResults = (NodeList) xPath.evaluate("TestResult",test, XPathConstants.NODESET);
for (int j=0; j<testResults.getLength(); j++) {
// Now you can iterate over all your testResults from test
// testResultName = xPath.evaluate("#name",testResults.item(j)));
}
}
I have XML:
<Table>
<Row ss:Index="74" ss:AutoFitHeight="0" ss:Height="14">
<Cell ss:Index="1" ss:MergeAcross="3" ss:StyleID="s29">
<ss:Data ss:Type="Number" xmlns="http://www.w3.org/TR/REC-html40">
0.00
</ss:Data>
</Cell>
<Cell ss:Index="15" ss:MergeAcross="5" ss:StyleID="s29">
<ss:Data ss:Type="Number" xmlns="http://www.w3.org/TR/REC-html40">
4.57
</ss:Data>
</Cell>
</Row>
Here is code used to extract the content, eg. "0.00", based on row index & cell index:
public static String getCellValueNum(String filename, int rowIdx, int colIdx) {
// search for Table element anywhere in the source
String tableElementPattern = "//*[name()='Table']";
// search for Row element with given number
String rowPattern = String.format("/*[name()='Row' and #ss:Index='%d']", rowIdx) ;
// search for Cell element with given column number
String cellPattern = String.format("/*[name()='Cell' and #ss:Index='%d']", colIdx) ;
// search for element that has ss:Type="String" attribute, search for element with text under it and get text name
String cellStringContent = "/*[#ss:Type='Number']/*[text()]/text()";
String completePattern = tableElementPattern + rowPattern + cellPattern + cellStringContent;
try (FileReader reader = new FileReader(filename)) {
XPath xPath = getXpathProcessor();
Node n = (Node)xPath.compile(completePattern)
.evaluate(new InputSource(reader), XPathConstants.NODE);
if (n.getNodeType() == Node.TEXT_NODE) {
return n.getNodeValue().trim();
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
private static XPath getXpathProcessor() {
// this is where the custom implementation of NamespaceContext is used
NamespaceContext context = new NamespaceContextMap(
"html", "http://www.w3.org/TR/REC-html40",
//"xsl", "http://www.w3.org/1999/XSL/Transform",
"o", "urn:schemas-microsoft-com:office:office",
"x", "urn:schemas-microsoft-com:office:excel",
"ss", "urn:schemas-microsoft-com:office:spreadsheet");
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(context);
return xpath;
}
It works perfectly fine when 'ss:Type='String'', But when ss:Type='Number' It gives error:
java.lang.NullPointerException
at XpathBill.getCellValueNum(XpathBill.java:55)
at XpathBill.main(XpathBill.java:100)
I think here:
if (n.getNodeType() == Node.TEXT_NODE)
It should be something else instead of TEXT_NODE, I tried other NodeType Named Constants, but it didnt work.
Please Help.
Thank you!
I am using the Document object to extract all the tags from an xml. If the xml has an empty tag, I get a null pointer exception. How do I guard against this? How do I check for an empty tag?
<USTrade>
<CreditorId>
<CustomerNumber>xxxx</CustomerNumber>
<Name></Name>
<Industry code="FY" description="Factor"/>
</CreditorId>
<DateReported format="MM/CCYY">02/2012</DateReported>
<AccountNumber>54000</AccountNumber>
<HighCreditAmount>0000299</HighCreditAmount>
<BalanceAmount>0000069</BalanceAmount>
<PastDueAmount>0000069</PastDueAmount>
<PortfolioType code="O" description="Open Account (30, 60, or 90 day account)"/>
<Status code="5" description="120 Dys or More PDue"/>
<Narratives>
<Narrative code="GS" description="Medical"/>
<Narrative code="CZ" description="Collection Account"/>
</Narratives>
</USTrade>
<USTrade>
So, when I use:
NodeList nm = docElement.getElementsByTagName("Name");
if (nm.getLength() > 0)
name = nullIfBlank(((Element) nm.item(0))
.getFirstChild().getTextContent());
Nodelist gives a length of 1, because there is a tag, but when I do getTextContent(), it hits the null pointer because FirstChild() doesn't return anything for tag = Name
And, I have done this for each xml tag. Is there a simple check I can do before every tag extraction?
The first thing I would do would be to unchain your calls. This will give you the chance to determine exactly which reference is null and which reference you need to do a null check for:
NodeList nm = docElement.getElementsByTagName("Name");
if (nm.getLength() > 0) {
Node n = nm.item(0);
Node child = n.getFirstChild();
if(child == null) {
// null handling
name = null;
}
else {
name = nullIfBlank(child.getTextContent());
}
}
Also, check out the hasChildNodes() method on Node! http://docs.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/Node.html#hasChildNodes%28%29
while(current != null){
if(current.getNodeType() == Node.ELEMENT_NODE){
String nodeName = current.getNodeName();
System.out.println("\tNode: "+nodeName);
NamedNodeMap attributes = current.getAttributes();
System.out.println("\t\tNumber of Attributes: "+attributes.getLength());
for(int i=0; i<attributes.getLength(); i++){
Node attr = attributes.item(i);
String attName = attr.getNodeName();
String attValue= attr.getNodeValue();
System.out.println("\t\tAttribute Name: "+ attName+ "\tAttribute Value:"+ attValue);
}
}
Are you also wanting to print out the value of the node? If so, it's one line of code in my example you would have to add, and I can share that as well.
Did you tried something like that?
NodeList nm = docElement.getElementsByTagName("Name");
if ((Element) nm.item(0))
name = nullIfBlank(((Element) nm.item(0)).getFirstChild().getTextContent());
Can i get the full xpath from the org.w3c.dom.Node ?
Say currently node is pointing to some where the middle of the xml document. I would like extract the xpath for that element.
The output xpath I'm looking for is //parent/child1/chiild2/child3/node. A parent to node xpath. Just ignore the xpath's which are having expressions and points to the same node.
There's no generic method for getting the XPath, mainly because there's no one generic XPath that identifies a particular node in the document. In some schemas, nodes will be uniquely identified by an attribute (id and name are probably the most common attributes.) In others, the name of each element (that is, the tag) is enough to uniquely identify a node. In a few (unlikely, but possible) cases, there's no one unique name or attribute that takes you to a specific node, and so you'd need to use cardinality (get the n'th child of the m'th child of...).
EDIT:
In most cases, it's not hard to create a schema-dependent function to assemble an XPath for a given node. For example, suppose you have a document where every node is uniquely identified by an id attribute, and you're not using namespaces. Then (I think) the following pseudo-Java would work to return an XPath based on those attributes. (Warning: I have not tested this.)
String getXPath(Node node)
{
Node parent = node.getParent();
if (parent == null) {
return "/" + node.getTagName();
}
return getXPath(parent) + "/" + "[#id='" + node.getAttribute("id") + "']";
}
I am working for the company behind jOOX, a library that provides many useful extensions to the Java standard DOM API, mimicking the jquery API. With jOOX, you can obtain the XPath of any element like this:
String path = $(element).xpath();
The above path will then be something like this
/document[1]/library[2]/books[3]/book[1]
I've taken this code from
Mikkel Flindt post & modified it so it can work for Attribute Node.
public static String getFullXPath(Node n) {
// abort early
if (null == n)
return null;
// declarations
Node parent = null;
Stack<Node> hierarchy = new Stack<Node>();
StringBuffer buffer = new StringBuffer();
// push element on stack
hierarchy.push(n);
switch (n.getNodeType()) {
case Node.ATTRIBUTE_NODE:
parent = ((Attr) n).getOwnerElement();
break;
case Node.ELEMENT_NODE:
parent = n.getParentNode();
break;
case Node.DOCUMENT_NODE:
parent = n.getParentNode();
break;
default:
throw new IllegalStateException("Unexpected Node type" + n.getNodeType());
}
while (null != parent && parent.getNodeType() != Node.DOCUMENT_NODE) {
// push on stack
hierarchy.push(parent);
// get parent of parent
parent = parent.getParentNode();
}
// construct xpath
Object obj = null;
while (!hierarchy.isEmpty() && null != (obj = hierarchy.pop())) {
Node node = (Node) obj;
boolean handled = false;
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element e = (Element) node;
// is this the root element?
if (buffer.length() == 0) {
// root element - simply append element name
buffer.append(node.getNodeName());
} else {
// child element - append slash and element name
buffer.append("/");
buffer.append(node.getNodeName());
if (node.hasAttributes()) {
// see if the element has a name or id attribute
if (e.hasAttribute("id")) {
// id attribute found - use that
buffer.append("[#id='" + e.getAttribute("id") + "']");
handled = true;
} else if (e.hasAttribute("name")) {
// name attribute found - use that
buffer.append("[#name='" + e.getAttribute("name") + "']");
handled = true;
}
}
if (!handled) {
// no known attribute we could use - get sibling index
int prev_siblings = 1;
Node prev_sibling = node.getPreviousSibling();
while (null != prev_sibling) {
if (prev_sibling.getNodeType() == node.getNodeType()) {
if (prev_sibling.getNodeName().equalsIgnoreCase(
node.getNodeName())) {
prev_siblings++;
}
}
prev_sibling = prev_sibling.getPreviousSibling();
}
buffer.append("[" + prev_siblings + "]");
}
}
} else if (node.getNodeType() == Node.ATTRIBUTE_NODE) {
buffer.append("/#");
buffer.append(node.getNodeName());
}
}
// return buffer
return buffer.toString();
}
For me this one worked best ( using org.w3c.dom elements):
String getXPath(Node node)
{
Node parent = node.getParentNode();
if (parent == null)
{
return "";
}
return getXPath(parent) + "/" + node.getNodeName();
}
Some IDEs specialised in XML will do that for you.
Here are the most well known
oXygen
Stylus Studio
xmlSpy
For instance in oXygen, you can right-click on an element part of an XML document and the contextual menu will have an option 'Copy Xpath'.
There are also a number of Firefox add-ons (such as XPather that will happily do the job for you. For Xpather, you just click on a part of the web page and select in the contextual menu 'show in XPather' and you're done.
But, as Dan has pointed out in his answer, the XPath expression will be of limited use. It will not include predicates for instance. Rather it will look like this.
/root/nodeB[2]/subnodeX[2]
For a document like
<root>
<nodeA>stuff</nodeA>
<nodeB>more stuff</nodeB>
<nodeB cond="thisOne">
<subnodeX>useless stuff</subnodeX>
<subnodeX id="MyCondition">THE STUFF YOU WANT</subnodeX>
<subnodeX>more useless stuff</subnodeX>
</nodeB>
</root>
The tools I listed will not generate
/root/nodeB[#cond='thisOne']/subnodeX[#id='MyCondition']
For instance for an html page, you'll end-up with the pretty useless expression :
/html/body/div[6]/p[3]
And that's to be expected. If they had to generate predicates, how would they know which condition is relevant ? There are zillions of possibilities.
Something like this will give you a simple xpath:
public String getXPath(Node node) {
return getXPath(node, "");
}
public String getXPath(Node node, String xpath) {
if (node == null) {
return "";
}
String elementName = "";
if (node instanceof Element) {
elementName = ((Element) node).getLocalName();
}
Node parent = node.getParentNode();
if (parent == null) {
return xpath;
}
return getXPath(parent, "/" + elementName + xpath);
}