Query in xml files to select value from node - java

As an example, this is the kind of xml file I have:
<Node>
<Sample A>
...
</Sample A>
<Sample B>
<myType>importantValue</myType>
</Sample B>
...
<Sample Z>
<myValue>16</myValue>
</Sample Z>
<Node>
<Sample A>
...
</Sample A>
<Sample B>
<myType>importantValue</myType>
</Sample B>
...
<Sample Z>
<myValue>16</myValue>
</Sample Z>
How do I make a query similar to "Select myType and myValue where myValue > x"?
I am trying to use xPath to find the right element, and I am sure there is an easy way to do it, but as I am new to queries in XML I dont find a simple way.. Thanks in advance!

Assuming your XML is structured like this:
<Node>
<Sample>
<someValue>sharks</someValue>
</Sample>
<Sample>
<myValue>16</myValue>
</Sample>
<Sample>
<myType>importantValue</myType>
</Sample>
</Node>
<Node>
...
</Node>
...
And Samples are grouped by Node elements, then what you want is to find a specific Node with the properties you want, and then find the values from that same Node that you want.
Here's an XPath expression that gets a Node by Sample/myValue:
//Node[Sample/myValue = '16']
This reads as "get all Nodes whose child Sample's child myValue's value is '16'".
You can add to it to get different values:
//Node[Sample/myValue = '16']/Sample/myType
This changes the return value of the XPath expression, it reads as "get the value from myType, whose parent is Sample, whose parent is a Node whose Sample/myValue's value is '16'".
To get multiple values out of an XPath expression, you can use the | operator to combine multiple expressions together:
//Node[Sample/myValue = '16'] | //Node[Sample/myValue = '15']
This reads as "get all Nodes whose Sample/myValue value is '16' or '15'".

Following are the steps. Hope you didn't missed out them.
Create a Document from a file or stream
StringBuilder xmlStringBuilder = new StringBuilder();
xmlStringBuilder.append("<?xml version="1.0"?> <class> </class>");
ByteArrayInputStream input = new ByteArrayInputStream(
xmlStringBuilder.toString().getBytes("UTF-8"));
Document doc = builder.parse(input);
Build XPath
XPath xPath = XPathFactory.newInstance().newXPath();
Prepare Path expression and evaluate it
String expression = "/Node/SampleB";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
Iterate over NodeList
for (int i = 0; i < nodeList.getLength(); i++) {
Node nNode = nodeList.item(i);
...
}
Examine attributes
//returns specific attribute
getAttribute("attributeName");
//returns a Map (table) of names/values
getAttributes();
Examine sub-elements
//returns a list of subelements of specified name
getElementsByTagName("subelementName");
//returns a list of all child nodes
getChildNodes();
In your case,
String expression = "/Node/SampleB";
for (int i = 0; i < nodeList.getLength(); i++) {
Node nNode = nodeList.item(i);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println(eElement.getElementsByTagName("myType")
.item(0).getTextContent());}}

Related

Java DOM: Get all parents list

I Need to list all the Elements which contain at least one child. For example, in the below XML, H,I,T have at least one child field each. I Need to list out H,I,T only & ignore other parents such as G_GT, Rec etc.
<?xml version="1.0" encoding="UTF-8"?>
<Doc>
<Rec>
<H>
<Key>H</Key>
<F1>1</F1>
<I>
<Key>I</Key>
<F2>08</F2>
<G_GT>
<T>
<Key>T</Key>
<F3>1</F3>
</T>
<T>
<Key>T</Key>
<F3>2</F3>
</T>
</G_GT>
</I>
</H>
</Rec>
</Doc>
The Code should give Output as H,I,T.
I am working on Java with DOM parser. Could anyone suggest how to do this in Java using DOM? I cannot use functions as getElementByTagName etc as I can get XML with different parent & child names. Thus, I have to avoid any hardcoding of child or parent Name.
Regards,
Phil
To find the parent ELEMENT_NODE of an ELEMENT_NODE without a child ELEMENT_NODE you might start with following snippet
NodeList elements = document.getElementsByTagName("*");
Set<String> nodesNames = new LinkedHashSet<>();
for (int i = 0; i < elements.getLength(); i++) {
Node node = elements.item(i);
NodeList nodeList = node.getChildNodes();
for (int j = 0; j < nodeList.getLength(); j++) {
Node currentNode = nodeList.item(j);
if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
nodesNames.add(node.getParentNode().getNodeName());
break;
}
}
}
System.out.println("nodesNames = " + nodesNames);
would produce the output
[H, I, T]

xpath compile using java - NodeList with particular node value

<data xmlns:fsd="abc.org" xmlns:xlink="http://www.w3.org/1999/xlink">
<meta name="elapsed-time" value="46" />
<org-family>
<family-member id="5">
<publication-reference>
<document-id document-id-type="docdb">
<country>US</country>
<doc-number>3056228</doc-number>
<date>20160817</date>
</document-id>
</publication-reference>
</family-member>
<family-member id="2">
<publication-reference>
<document-id document-id-type="docdb">
<country>US</country>
<doc-number>2013315173</doc-number>
<date>20150430</date>
</document-id>
</publication-reference>
</family-member>
</org-family>
</data>
From the above xml i want to extract country and date node value, below are my java code
NodeList familyMembers = (NodeList) xPath.compile("//family-member//publication-reference//document-id[#document-id-type=\"docdb\"]//text()").evaluate(xmlDocument,XPathConstants.NODESET);
ArrayList mainFamily = new ArrayList();
for (int i = 0; i < familyMembers.getLength(); i++) {
mainFamily.add(familyMembers.item(i).getNodeValue());
}
but its extract all the three node value (country, doc-number and date), but i need only the two node value (country and date), in the for loop how should i pass the requested node value?
Once you selected a document-id node, the // operator selects its ALL descendants,
then text() converts each of them into a string. If you want to process only some of the descentant nodes, just list them (build an explicit sequence of sub-elements).
You can also get rid of expensive (and superfluous here!) // operators.
Try replacing the query with
"//family-member/publication-reference/document-id[#document-id-type=\"docdb\"]/(country, date)/text()"

java xml cast Node to Element

I know this was asked many times but I still cannot get it to work. I convert xml string to Document object and then parse it. Here is the code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.*;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
Document document = builder.parse( new InputSource( new StringReader( result ) ) );
Node head = document.getFirstChild();
if(head != null)
{
NodeList airportList = head.getChildNodes();
for(int i=0; i<airportList.getLength(); i++) {
Node n = airportList.item(i);
Element airportElem = (Element)n;
}
}
catch (Exception e) {
e.printStackTrace();
}
When I cast the Node object n to Element I get an exception java.lang.ClassCastException: org.apache.harmony.xml.dom.TextImpl cannot be cast to org.w3c.dom.Element. When I check the node type of the Node object it says Node.TEXT_NODE. I believe it should be Node.ELEMENT_NODE. Am I right?
So how do I convert Node to Element, so I can do something like element.getAttribute("attrName").
Here is my XML:
<?xml version="1.0" encoding="utf-8" ?>
<ArrayOfCity xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<City>
<strName>Abu Dhabi</strName>
<strCode>AUH</strCode>
</City>
<City>
<strName>Amsterdam</strName>
<strCode>AMS</strCode>
</City>
<City>
<strName>Antalya</strName>
<strCode>AYT</strCode>
</City>
<City>
<strName>Bangkok</strName>
<strCode>BKK</strCode>
</City>
</ArrayOfCity>
Thanks in advance!
I think you need something like this:
NodeList airportList = head.getChildNodes();
for (int i = 0; i < airportList.getLength(); i++) {
Node n = airportList.item(i);
if (n.getNodeType() == Node.ELEMENT_NODE) {
Element elem = (Element) n;
}
}
When I cast the Node object n to Element I get an exception java.lang.ClassCastException: org.apache.harmony.xml.dom.TextImpl cannot be cast to org.w3c.dom.Element. When I check the node type of the Node object it says Node.TEXT_NODE. I believe it should be Node.ELEMENT_NODE. Am I right?
Probably not, the parser is probably right. It means that some of the nodes in what you're parsing are text nodes. For example:
<foo>bar</foo>
In the above, we have a foo element containing a text node. (The text node contains the text "bar".)
Similarly, consider:
<foo>
<bar>baz</bar>
</foo>
If your XML document literally looks like the above, it contains a root element foo with these child nodes (in order):
A text node with some whitespace in it
A bar element
A text node with some more whitespace in it
Note that the bar element is not the first child of foo. If it looked like this:
<foo><bar>baz</bar></foo>
then the bar element would be the first child of foo.
you can also try to "protect" your casting
Node n = airportList.item(i);
if (n instanceof Element)
{
Element airportElem = (Element)n;
// ...
}
but as pointed by others, you have text node, those won't be casted by this method, be sure you don't need them of use the condition to have a different code to process them

How to get all values in a list of objects with xpath?

How can I get the name of all accounts with xpath?
The following expression does only return the first accounts name:
XPathExpression fax = xpath.compile("/accounts/account/name")
<accounts>
<account>
<name>Johndoe1<name>
<account>
<account>
<name>Johndoe2<name>
<account>
</account>
According to this tutorial : http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html you need to do something like this :
XPathExpression fax = xpath.compile("/accounts/account/name")
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
That is the correct XPath expression, but the result you get depends on how you evaluate it. The XPath expression /accounts/account/name returns a node set containing (in document order) all the name child elements of all the account elements under the accounts root element in the document.
In the XPath data model the string value of a node set is defined to be the string value of the first node in the set. So if you use the single-argument evaluate:
fax.evaluate(someDocument)
this will evaluate the expression as a string and you'll just get the first name value. Instead you need to evaluate the expression as a node set, then extract the string value of each node in turn, as suggested in koopajah's answer.
Try with accounts//account, this will give you all the account tags within your document who's descendant is "accounts"

java dom parser- how to check if an element is null or not

I am trying to parse an xml file in which there is a group element "patent-assignee" which contains some elements- name, address1, address2,city,state, postcode, country.
While values will always be there for "name" and "address1" the other elements may or may not have values.
I have navigated to a single patent-assignee element, and now want to check if this record has value for address2 (and other fields) or not.
Some relevant code is given below--
el_patentassignees= (Element) npassignee.item(ncount);
//now el_patentassignee has in it the content of one patent assignee element
el_assigneeaddress2= (Element) el_patentassignees.getElementsByTagName("address2").item(0);
val_assigneeaddress2= el_assigneeaddress2.getTextContent();
Iterate through all child nodes of el_assigneeaddress2, then, if you see a Text node, take the value:
NodeList nodeList = el_assigneeaddress2.getChildNodes();
for (int i = 0; i < nodeList.getLength(), i++) {
Node child = nodeList.item(i);
if (child.getName().equals("#text")) {
val_assigneeaddress2= child.getTextContent();
break;
}
}

Categories

Resources