JAVA : Extract element from XML Document

JAVA : Extract element from XML Document - java

I'm using javax.xml.parsers to navigate through an XML document like the one below:
`
<ContextElement>
<DimensionNode>Role</DimensionNode>
<Value>Administration</Value>
<TailoringExpressions>
<TailoringExpression>
<Relation>Student</Relation>
<ProjAtt>
<Attribute>Matr</Attribute>
<Attribute>SName</Attribute>
<Attribute>SSurname</Attribute>
<Attribute>SDateOfBirth</Attribute>
<Attribute>SEmail</Attribute>
<Attribute>SAddress</Attribute>
</ProjAtt>
<Condition/>
<SemiJoinRel/>
<SemiJoinOn/>
<SemiJoinCond/>
</TailoringExpression>
</TailoringExpressions>
</ContextElement>
<ContextElement>
<DimensionNode>Deadline</DimensionNode>
<Value>Lost</Value>
<TailoringExpressions>
<TailoringExpression>
<Relation>Deadline</Relation>
<ProjAtt>
<Attribute>IdDeadline</Attribute>
<Attribute>Student</Attribute>
<Attribute>DeadlineDate</Attribute>
<Attribute>Description</Attribute>
<Attribute>IsMet</Attribute>
</ProjAtt>
<Condition>DeadlineDate LT CurrentDate AND IsMet=False</Condition>
<SemiJoinRel/>
<SemiJoinOn/>
<SemiJoinCond/>
</TailoringExpression>
</TailoringExpressions>
</ContextElement>
`
I've a problem because I need to extract the object/node ContextElement which has as DimensionNode the value "Role" and as Value the value "Administration" and I'm not able to write a working code!
Can someone tell me how to do that?
Thanks

I think the best way to extract values is to use xpath:
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/widgets/widget";
InputSource inputSource = new InputSource("widgets.xml");
NodeSet nodes = (NodeSet) xpath.evaluate(expression, inputSource, XPathConstants.NODESET);
For more information look at oracle documentation.

Related

Getting null values from XPath query

I have this xml file:
<?xml version="1.0" encoding="UTF-8"?>
<iet:aw-data xmlns:iet="http://care.aw.com/IET/2007/12" class="com.aw.care.bean.resource.MessageResource">
<iet:metadata filter=""/>
<iet:message-resource>
<iet:message>some message 1</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.11</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
<iet:message-resource>
<iet:message>some message 2</iet:message>
<iet:customer id="1"/>
<iet:code>edi.claimfilingindicator.12</iet:code>
<iet:locale>iw_IL</iet:locale>
</iet:message-resource>
.
.
.
.
</iet:aw-data>
Using this code below i'm getting over the data and finding what I need.
try {
FileInputStream fileIS = new FileInputStream(new File("resources\\bootstrap\\content\\MessageResources_iw_IL\\MessageResource_iw_IL.ctdata.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String query = "//*[local-name()='message-resource']//*[local-name()='code'][contains(text(), 'account')]";
NodeList nodeList = (NodeList) xPath.compile(query).evaluate(xmlDocument, XPathConstants.NODESET);
System.out.println("size= " + nodeList.getLength());
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getNodeValue());
}
}
catch (Exception e){
e.printStackTrace();
}
The issue is that i'm getting only null values while printing in the for loop, any idea why it's happened?
The code needs to return a list of nodes which have a code and message fields that contains a given parameters (same as like SQL query with two parameters with operator of AND between them)

Check the documentation:
https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html
getNodeValue() applied to an element node returns null.
Use getTextContent().
Alternatively, if you find DOM too frustrating, switch to one of the better tree models like JDOM2 or XOM.
Also, if you used an XPath 2.0 engine like Saxon, it would (a) simplify your expression to
//*:message-resource//*:code][contains(text(), 'account')]
and (b) allow you to return a sequence of strings from the XPath expression, rather than a sequence of nodes, so you wouldn't have to mess around with nodelists.
Another point: I suspect that the predicate [contains(text(), 'account')] should really be [.='account']. I'm not sure of that, but using text() instead of ".", and using contains() instead of "=", are both common mistakes.

Access data in xml as string

I am receiving a xml in string format. Is there any library to search for elements in the string?
<Version value="0"/>
<IssueDate>2017-12-15</IssueDate>
<Locale>en_US</Locale>
<RecipientAddress>
<Category>Primary</Category>
<SubCategory>0</SubCategory>
<Name>Vitsi</Name>
<Attention>Stowell Group Llc.</Attention>
<AddressLine1>511 6th St</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-2903</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>2903</ZIP4>
</RecipientAddress>
<RecipientAddress>
<Category>Additional</Category>
<SubCategory>1</SubCategory>
<Name>Vitsi</Name>
<AddressLine1>Po Box 957</AddressLine1>
<City>Lake Oswego</City>
<Country>United States</Country>
<PresentationValue>Lake Oswego OR 97034-0104</PresentationValue>
<State>OR</State>
<ZIPCode>97034</ZIPCode>
<ZIP4>0104</ZIP4>
</RecipientAddress>
<SenderName>TMO</SenderName>
<SenderId>IL</SenderId>
<SenderAddress>
<Name>T-mobile</Name>
<AddressLine1>Po Box 790047</AddressLine1>
<City>St. Louis</City>
<PresentationValue>ST. LOUIS MO 63179-0047</PresentationValue>
<State>MO</State>
<ZIPCode>63179</ZIPCode>
.
.
.
.
I want to access the element RecipientAddress, which is a list. Is there any library to do that? Please note that what I receive is a string. It is an invoice and there will be many to process, so performance is important

Following options are available:
Convert xml string to java objects using JAXB.
Use .indexOf() in string method to retrieve specific parts of xml.
Use regular expression to retrieve specific parts of xml.
SAX/DOM/STAX parser for parsing and extraction from xml.
Xpath for fetching the specific values from xml.

You could use XPATH. Java has inbuilt support for XML querying without any thirdparty library,
Code piece would be,
String xmlInputStr = "<YOUR_XML_STRING_INPUT>"
String xpathExpressionStr = "<XPATH_EXPRESSION_STRING>"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlInputStr);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(xpathExpressionStr);
You can write your own expression string for querying. Typical example
"/RecipientAddress/Category"
Evaluate your xml against expression to retrieve list of nodes.
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
And iterate over nodes,
for (int i = 0; i < nodes.getLength(); i++) {
Node nNode = nodes.item(i);
...
}

There lot of pre-implemented api is available to convert xml to java object.
please look at that the xerces from Apache.
If you want extract only specified value the put whole in to string and use indexOf("string")

Extracting the node values in XML with XPath in Java

I have an XML document:
<response>
<result>
<phone>1233</phone>
<sys_id>asweyu4</sys_id>
<link>rft45fgd</link>
<!-- Many more in result -->
</result>
<!-- Many more result nodes -->
</response>
The XML structure is unknown. I am getting XPath for attributes from user.
e.g. inputs are strings like:
//response/result/sys_id , //response/result/phone
How can I get these node values for whole XML document by evaluating XPath?
I referred this but my xpath is as shown above i.e it does not have * or text() format.
The xpath evaluator works perfectly fine with my input format, so is there any way I can achieve the same in java?
Thank you!

It's difficult without seeing your code... I'd just evaluate as a NodeList and then call getTextContent() on each node in the result list...
String input = "<response><result><phone>1233</phone><sys_id>asweyu4</sys_id><link>rft45fgd</link></result><result><phone>1233</phone><sys_id>another-sysid</sys_id><link>another-link</link></result></response>";
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new ByteArrayInputStream(input.getBytes("UTF-8")));
XPath path = XPathFactory.newInstance().newXPath();
NodeList node = (NodeList) path.compile("//response/result/sys_id").evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < node.getLength(); i++) {
System.out.println(node.item(i).getTextContent());
}
Output
asweyu4
another-sysid

Java with XPath and TagSoup

I am using TagSoup with java to extract some data , but certain XPATH are not working , I just get empty results
FileReader frInHtml = new FileReader("doc.html");
BufferedReader brInHtml = new BufferedReader(frInHtml);
SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);
// This is working
XPath xpath = XPath.newInstance("/ns:html[1]/ns:body/ns:div[#class='content']/ns:table/ns:tr/ns:td/ns:h1");
// All 3 lines below didn't work , tried them 1 at a time
XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[7]/ns:table/ns:tbody/ns:tr/ns:td/ns:h1");
XPath xpath = XPath.newInstance("//html//body//div[7]//table//tbody//tr//td//h1");
XPath xpath = XPath.newInstance("/html/body/div[7]/table/tbody/tr/td/h1");
xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");

To debug this you will need to look at the "equivalent XML" produced by TagSoup. And for us to help you, you will need to show us the equivalent XML.

How to program some XPath functions using Java Design Patterns

I need your help and your experience to realize the best java code using Design Patterns.
I must write some custom XPath functions that can:
Load a DOM document (I can use a mock object);
Check the validity of an user XPath expression;
Find and return the DOM node that satisfy the user expression.
I must evaluate only absolute expressions ( /... ) that can contain the path expression " .. " and predicates, embedded in square brackets, regarding attributes or leaf nodes, for examples:
/com/university/student/../exam
/com/university/exam[#tt = 'poo']/vote
/com/university/student/number[. = '1234']
I'll use the Composite pattern for the first step, the Chain of Resonsibility for the second step and a Visitor for the third step but I am not sure that this can be the best way to do this.
Can Chain of Resonsibility be usefull to check the validity?
All suggestions are welcome, thank you in advance for any help you can provide.

Isn't it a bit ... overcomplicated?
Create a DOM object for some XML input
Compile the user input - XPath will complain if it is not valid (XPathExpressionException)
Evalute the expression with the DOM object
Sample:
// #1 load document
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
// #2 - validate expression
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = null;
try {
XPathExpression expr = xpath.compile(getExpression());
} catch (XPathExpressionException e) {
// ... handle & return <- invalid expression
}
// #3 evaluate expression
String result = expr.evaluate(doc);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JAVA : Extract element from XML Document - java

Related

Getting null values from XPath query

Access data in xml as string

Extracting the node values in XML with XPath in Java

Java with XPath and TagSoup

How to program some XPath functions using Java Design Patterns

Categories

Resources