Extract a node with its entire content from a namespaced xml - java

Given the following namespaced xml file:
<ptk:PrintTalk xmlns:ptk="http://linkToNameSpace"> xmlns:xjdf="http://linkToNamespace"
<ptk:Request>
<ptk:PurchaseOrder Currency="EUR">
<xjdf:XJDF name="someName" version="2.0">
<xjdf:ProductList>
<xjdf:Product>
...
</xjdf:Product>
<xjdf:OtherProduct>
...
</xjdf:OtherProduct>
and many other products
</xjdf:ProductList>
<xjdf:ParameterSet>
<xjdf:Parameter>
...
</xjdf:Parameter> and so on until
</xjdf:XJDF>
</ptk:PurchaseOrder>
</ptk:Request>
</ptk:PrintTalk>
how would I extract following using XPath:
<xjdf:XJDF name="someName" version="2.0">
<xjdf:ProductList>
<xjdf:Product>
...
</xjdf:Product>
<xjdf:OtherProduct>
...
</xjdf:OtherProduct>
and many other products
</xjdf:ProductList>
<xjdf:ParameterSet>
<xjdf:Parameter>
...
</xjdf:Parameter> and so on until
</xjdf:XJDF>
I already tried something like:
/ptk:PrintTalk/ptk:Request/ptk:PurchaseOrder/*
or
//xjdf:XJDF
but these expressions give me not the result I am looking for. I use IntellijIdea's built in xpath expression evaluator, programming language is java. No libraries for xpath - just java.xml.*
UPDATE
using
//ptk:PurchaseOrder//*
I get every node as a single node without any child nodes inside, e. g. would
<xjdf:ProductList>
<xjdf:Product>
...
</xjdf:Product>
</xjdf:ProductList> (here the product tag is a child of product list tag)
result in
<xjdf:ProuctList>
<xjdf:Product>
The java code I use to do the operation:
#Override
public XJDF readFrom(
final Class<XJDF> type, final Type genericType, final Annotation[] annotations, final MediaType mediaType,
final MultivaluedMap<String, String> multivaluedMap, final InputStream inputStream
) throws IOException {
try {
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document documentPtk = documentBuilder.parse(new InputSource(inputStream));
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPath.compile("//ptk:PurchaseOrder//*");
Document documentXjdf = (Document) xPathExpression.evaluate(documentPtk, XPathConstants.NODE);
} catch (Exception e) {
throw new WebApplicationException("PrintTalk document could not be deserialized.", e);
}
}

Three main points to make here:
DocumentBuilderFactory is not namespace-aware by default, you must explicitly switch on namespaces before you create the DocumentBuilder
XPath doesn't use the namespace prefix mappings from the XML document, it uses its own NamespaceContext instead
The Node returned by this query won't be a Document, it'll be an Element.
Annoyingly there's no default implementation of NamespaceContext in the Java core class library so you have to either use a third party one (I usually use the SimpleNamespaceContext from Spring) or write your own implementation of the interface.
Here's an example using SimpleNamespaceContext:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document documentPtk = documentBuilder.parse(new InputSource(inputStream));
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
nsCtx.bindNamespaceUri("p", "http://linkToNameSpace");
xPath.setNamespaceContext(nsCtx);
XPathExpression xPathExpression = xPath.compile("/p:PrintTalk/p:Request/p:PurchaseOrder/*");
Element documentXjdf = (Element) xPathExpression.evaluate(documentPtk, XPathConstants.NODE);

Related

Get Parent attribute value based XML search using xpath java

I want get RECORD number based on contract id passed to java method. can any one help on this as i am new to XML parsing?
sample xml file:
<?xml version="1.0"?><FILE>
<Document RECORD="1"><Contract-Id>234</Contract-Id><Client-Id>232</Client-Id></Document>
<Document RECORD="2"><Contract-Id>235</Contract-Id><Client-Id>334</Client-Id></Document>
</FILE?
Java code:
File fXmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document xmlDocument = dBuilder.parse(fXmlFile);
xmlDocument.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpr = xPath.compile("//Document/Contract-Id[text()='"+ContractNumber+"']");
//Object result = xPathExpr.evaluate(xmlDocument,XPathConstants.NODESET);
Node nl = (Node)xPathExpr.evaluate(xmlDocument.getParentNode(), XPathConstants.NODESET);
nl.getTextContent();
nl.getAttributes();
Please try the following XPath expression:
/FILE/Document[Contract-Id="235"]/#RECORD

xml parse - xpath clarification in Java

How should i get the Link value from the below xml
XML Content
<document-instance system="abc.org" number-of-pages="6" desc="Drawing" link="www.google.com">
<document-format-options>
<document-format>application/pdf</document-format>
<document-format>application/tiff</document-format>
</document-format-options>
<document-section name="DRAWINGS" start-page="1" />
</document-instance>
i traverse update desc attribute after that i'm struggle
XPathExpression firstPageUrl = xPath.compile("//document-instance/#desc=\"Drawing\"]");
Expected output : retrieve the Link value
www.google.com
File file = new File("path to file");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//document-instance/#link";
Node node = (Node) xPath.compile(expression).evaluate(doc, XPathConstants.NODE);
String url= node.getTextContent();

Read sitemap with XPath

I want to read Sitemap with XPath but it doesn't work.
here is my code :
private void evaluate2(String src){
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
try{
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new ByteArrayInputStream(src.getBytes()));
System.out.println(src);
XPathFactory xp_factory = XPathFactory.newInstance();
XPath xpath = xp_factory.newXPath();
XPathExpression expr = xpath.compile("//url/loc");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println(nodes.getLength());
for (int i = 0; i < nodes.getLength(); i++) {
items.add(nodes.item(i).getNodeValue());
System.out.println(nodes.item(i).toString());
}
}catch(Exception e){
System.out.println(e.getMessage());
}
}
Before I retrieve the remote source of the sitemap, and it's passed to evaluate2 through the variable src.
And the System.out.println(nodes.getLength()); display 0
My xpath query is working because this query work in PHP.
Do you see errors in my code ?
Thanks
You parse the sitemap with a namespace-aware parser (that's what factory.setNamespaceAware(true) does), but then attempt to access it using an XPath that does not usea namespace resolver (or reference any namespaces).
The simplest solution is to configure the parser as not namespace aware. As long as you're just parsing a self-contained sitemap, that shouldn't be a problem.
One more problem in your code is that you pass the sitemap contents as a String, then convert that String using the platform default encoding. This will work as long as your platform-default encoding matches that of the actual bytes that you retrieved from the server (assuming that you also created the string using the platform-default encoding). If it doesn't, you're likely to get a conversion error.
I think the input has namespace. So you would have to initialize the namespaceContext for the xpath object and change your xpath with prefixes. i.e. //usr/loc should be //ns:url/ns:loc
and then add the namespace prefix binding in the namespace object.
You can find an NamespaceContext implementation available with apache common. http://ws.apache.org/commons/util/apidocs/index.html
ws-commons-utils
NamespaceContextImpl namespaceContextObj = new NamespaceContextImpl();
nsContext.startPrefixMapping("ns", "http://sitename/xx");
xpath.setNamespaceContext(namespaceContextObj);
XPathExpression expr = xpath.compile("//ns:url/ns:loc");
In case you don't know what namespaces that are comming, you can get them from the document it self, but I doubt it ll be of much use. There are few how-tos here
http://www.ibm.com/developerworks/xml/library/x-nmspccontext/index.html
I can't see any errors in your code so I gues the problem is the source.
Are you sure that the source file contains this element?
Maybe you could try to use this code to parse the String in an Document
builder.parse(new InputSource(new StringReader(xml)));

Getting value of child node from XML in java

My xml file looks like this
<InNetworkCostSharing>
<FamilyAnnualDeductibleAmount>
<Amount>6000</Amount>
</FamilyAnnualDeductibleAmount>
<IndividualAnnualDeductibleAmount>
<NotApplicable>Not Applicable</NotApplicable>
</IndividualAnnualDeductibleAmount>
<PCPCopayAmount>
<CoveredAmount>0</CoveredAmount>
</PCPCopayAmount>
<CoinsuranceRate>
<CoveredPercent>0</CoveredPercent>
</CoinsuranceRate>
<FamilyAnnualOOPLimitAmount>
<Amount>6000</Amount>
</FamilyAnnualOOPLimitAmount>
<IndividualAnnualOOPLimitAmount>
<NotApplicable>Not Applicable</NotApplicable>
</IndividualAnnualOOPLimitAmount>
</InNetworkCostSharing>
I am trying to get Amount value from <FamilyAnnualDeductibleAmount> and also from <FamilyAnnualOOPLimitAmount>. How do i get those values in java?
You may use two XPath queries /InNetworkCostSharing/FamilyAnnualDeductibleAmount and InNetworkCostSharing/FamilyAnnualOOPLimitAmount or just get the node InNetworkCostSharing and retrieve the values of its two direct children.
Solution using XPath:
// load the XML as String into a DOM Document object
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream("YOUR XML".getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the content of the <FamilyAnnualDeductibleAmount> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/InNetworkCostSharing/FamilyAnnualDeductibleAmount/text()");
String familyAnnualDeductibleAmount = (String)expr.evaluate(doc, XPathConstants.STRING);
StAX based solution:
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader rdr = f.createXMLStreamReader(new FileReader("test.xml"));
while (rdr.hasNext()) {
if (rdr.next() == XMLStreamConstants.START_ELEMENT) {
if (rdr.getLocalName().equals("FamilyAnnualDeductibleAmount")) {
rdr.nextTag();
int familyAnnualDeductibleAmount = Integer.parseInt(rdr.getElementText());
System.out.println("familyAnnualDeductibleAmount = " + familyAnnualDeductibleAmount);
} else if (rdr.getLocalName().equals("FamilyAnnualOOPLimitAmount")) {
rdr.nextTag();
int familyAnnualOOPLimitAmount = Integer.parseInt(rdr.getElementText());
System.out.println("FamilyAnnualOOPLimitAmount = " + familyAnnualOOPLimitAmount);
}
}
}
rdr.close();
Note that StAX is especially good for cases like yours, it skips all unnecessary elements reading only the ones you need
Try something like this(use getElementsByTagName to get the parent nodes and then get the value be reaching out to child node):
File xmlFile = new File("NetworkCost.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile );
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("FamilyAnnualDeductibleAmount");
String familyDedAmount = nList.item(0).getChildNodes().item(0).getTextContent();
nList = doc.getElementsByTagName("FamilyAnnualOOPLimitAmount");
String familyAnnualAmount =
nList.item(0).getChildNodes().item(0).getTextContent();
I think I found the solution with this question from stackoverflow
Getting XML Node text value with Java DOM

getElementsByTagName doesn't work

I have next simple part of code:
String test = "<?xml version="1.0" encoding="UTF-8"?><TT_NET_Result><GUID>9145b1d3-4aa3-4797-b65f-9f5e00be1a30</GUID></TT_NET_Result>"
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(test)));
NodeList nl = doc.getDocumentElement().getElementsByTagName("TT_NET_Result");
The problem is that I don't get any result - nodelist variable "nl" is empty.
What could be wrong?
You're asking for elements under the document element, but TT_NET_Result is the document element. If you just call
NodeList nl = doc.getElementsByTagName("TT_NET_Result");
then I suspect you'll get the result you want.
Here's another response to this old question. I hit a similar issue in my code today and I actually read/write XML all the time. For some reason I overlooked one major fact. If you want to use
NodeList elements = doc.getElementsByTagNameNS(namespace,elementName);
You need to parse your document with a factory that is namespace-aware.
private static DocumentBuilderFactory getFactory() {
if (factory == null){
factory = DocumentBuilderFactory
.newInstance();
factory.setNamespaceAware(true);
}
return factory;
}

Categories

Resources