Parsing XML in Java from a URL - java

I have a webpage that has a XML document.
The URL is something like www.snfnffn.com/pareste.xml
The XML do I want to parse looks like:
<school>
<rating value="4">"hi"</rating>
.... <!-- more tags here -->
</school>
NOTE: THE rating tag can be nested under any number of tags but it WILL ALWAYS have a value attribute
How can I get the value attribute of the rating tag? I only want the first instance of rating if there are many rating tags in the document.
Note that the XML doc is on a URL.
I tried:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL(url).openStream());
but not sure what to do after this.

Try this:
NodeList layerConfigList = doc.getElementsByTagName("rating");
Node node = layerConfigList.item(0);
Element e = (Element)node;
String value = e.getAttribute("value");

Related

> and < gets converted to > and < while adding a xml like string in element.setTextContent()

I have a string which looks like an XML
Ex: String sample = "<GrpHdr><MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId><CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm></GrpHdr>";
I am trying to create an XML document with an element containing the above information:
Ex:
<ns1:TstCode>T</ns1:TstCode>
<ns1:FType>SCF</ns1:FType>
<ns1:FileRef>220811084023</ns1:FileRef>
<ns1:RoutingInd>ALL</ns1:RoutingInd>
<ns1:FileBusDt>2022-08-11</ns1:FileBusDt>
<ns1:FIToFI xmlns="urn:iso:std:iso:20022:tech:xsd">
<GrpHdr>
<MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId>
<CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm>
</GrpHdr>
</ns1:FIToFI>
When I create the document for the above XML using this code:
private static DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
DOMImplementation domImpl = db.getDOMImplementation();
private Document buildExampleDocumentWithNamespaces(DOMImplementation domImpl, String output) {
Document document = domImpl.createDocument("urn:Scf:xsd:$BlkCredTrf", "ns1:BlkCredTrf", null);
document.getDocumentElement().appendChild(document.createElement("ns1:TstCode")).setTextContent("T");
document.getDocumentElement().appendChild(document.createElement("ns1:FType")).setTextContent("SCF");
document.getDocumentElement().appendChild(document.createElement("ns1:FileRef")).setTextContent("220811084023");
document.getDocumentElement().appendChild(document.createElement("ns1:RoutingInd")).setTextContent("ALL");
document.getDocumentElement().appendChild(document.createElement("ns1:FileBusDt")).setTextContent("2022-08-11");
document.getDocumentElement().appendChild(document.createElementNS("urn:iso:std:iso:tech:xsd","ns1:FIToFI");
return document;
}
I do not have issues until this point.
When I try to add <GrpHdr><MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId><CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm></GrpHdr> as a Text content to the FIToFI tag at last, using the code:
document.getDocumentElement().appendChild(document.createElementNS("urn:iso:std:iso:tech:xsd","ns1:FIToFI").setTextContent(sample);
The XML gets created like this:
<ns1:TstCode>T</ns1:TstCode>
<ns1:FType>SCF</ns1:FType>
<ns1:FileRef>220811084023</ns1:FileRef>
<ns1:RoutingInd>ALL</ns1:RoutingInd>
<ns1:FileBusDt>2022-08-11</ns1:FileBusDt>
<ns1:FIToFI xmlns="urn:iso:std:iso:20022:tech:xsd">
<GrpHdr>
<MsgId>MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV</MsgId>
<CreDtTm>2023-02-02T21:48:58.075+05:30</CreDtTm>
</GrpHdr>
</ns1:FIToFI>
Please help me to create this XML without the escape characters.
That the content is escaped is intended. When you set the text content of an element, any special character like < have to be escaped like <, otherwise the text content will be interpreted as other XML content like elements or comments. That's why setTextContent() will escape the content for you.
When you want to add an element instead, you use methods like appendChild() with an Element argument. Build your elements as usual with the createElement() method and add them together like this:
Element element = document.createElementNS("urn:iso:std:iso:tech:xsd","ns1:FIToFI");
Element grpHdr = document.createElement("GrpHdr");
Element msgId = document.createElement("MsgId");
msgId.setTextContent("MQSECJYJHRBPDTZTYNNEYXOZUPAUDEKVDFV");
grpHdr.appendChild(msgId);
Element creDtTm = document.createElement("CreDtTm");
creDtTm.setTextContent("2023-02-02T21:48:58.075+05:30");
grpHdr.appendChild(creDtTm);
element.appendChild(grpHdr);
document.getDocumentElement().appendChild(element);
This will add the XML element inside the other XML element.
When you have the inner XML as a string, parse the XML string with DocumentBuilder.parse() (see How to create a XML object from String in Java?) and import the Element with the Document.importNode() method (see org.w3c.dom.DOMException: WRONG_DOCUMENT_ERR: A node is used in a different document than the one that created it). The code can look like this:
String innerXml = "<GrpHdr><MsgId[...]eDtTm></GrpHdr>";
Document innerDocument = db.parse(new InputSource(new StringReader(innerXml)));
Element innerRootElement = innerDocument.getDocumentElement();
Node importedNode = document.importNode(innerRootElement, true);
element.appendChild(importedNode);

Add prefixe to all tags of an XmlObject

I have xml String and i want to build a new Dom Element using this xml string by adding a prefix "peci" to all tags of this xml.
this is my code :
Document Effective_Change=null;
factory=XmlObject.Factory.parse(NewElement);
String testxml =factory.toString();
System.out.println(testxml);
DocumentBuilderFactory fact = DocumentBuilderFactory.newInstance();
fact.setNamespaceAware(true);
DocumentBuilder build;
build = fact.newDocumentBuilder();
Effective_Change = build.parse(new InputSource(new StringReader(testxml)));
Effective_Change.setPrefix("peci");
System.out.println(factory.xmlText());
at begining the tesxml contain :
<peci:xml-fragment xmlns:peci="urn:com.url/peci">
<Derived_Event_Code>xx</Derived_Event_Code>
<Effective_Moment>2018-07-23T04:20:04</Effective_Moment>
<Entry_Moment>2018-07-23T04:20:04</Entry_Moment>
<Person_Identification isUpdated="1">
<Government_Identifier isDeleted="1">
<Government_ID>xxxxx</Government_ID>
<Government_ID_Type>xxxxxx/Government_ID_Type>
<Issued_Date>xxxxx</Issued_Date>
</Government_Identifier>
</Person_Identification>
</peci:xml-fragment>
but i debeuged in this line
Effective_Change.setPrefix("peci");
i get this error :
org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
at org.apache.xerces.dom.NodeImpl.setPrefix(NodeImpl.java:701)
and what i want get as result is :
<peci:xml-fragment xmlns:peci="urn:com.url/peci">
<peci:Derived_Event_Code>xx</peci:Derived_Event_Code>
<peci:Effective_Moment>2018-07-23T04:20:04</peci:Effective_Moment>
<peci:Entry_Moment>2018-07-23T04:20:04</peci:Entry_Moment>
<peci:Person_Identification isUpdated="1">
<peci:Government_Identifier isDeleted="1">
<peci:Government_ID>xxxxx</peci:Government_ID>
<peci:Government_ID_Type>xxxxxx</peci:Government_ID_Type>
<peci:Issued_Date>xxxxx</peci:Issued_Date>
</peci:Government_Identifier>
</peci:Person_Identification>
</peci:xml-fragment>

Get element value from XML with XPath

I have XML file like this:
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents- xmlns:xades="http://uri.etsi.org/01903/v1.3.2#" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 UBL-Invoice-2.1.xsd">
<cac:AccountingSupplierParty>
<cac:Party>
<cac:PartyIdentification>
<cbc:ID schemeID="schema1">123231123</cbc:ID>
</cac:PartyIdentification>
<cac:PartyIdentification>
<cbc:ID schemeID="schema2">2323232323</cbc:ID>
</cac:PartyIdentification>
<cac:PartyIdentification>
<cbc:ID schemeID="schema3">4442424</cbc:ID>
</cac:PartyIdentification>
<cac:PostalAddress>
<cbc:CityName>İstanbul</cbc:CityName>
<cac:Country>
<cbc:Name>Turkey</cbc:Name>
</cac:Country>
</cac:PostalAddress>
</cac:Party>
</cac:AccountingSupplierParty>
</Invoice>
I want to access schemeID="schema=2" value. I try XPath and document.getElementsByTagName. I can access elements with document.getElementsByTagName, since is multiple I can't access the element I want. When I try to with XPath, I can't access any elements from XML.
Here is my XPath implementation:
try {
String decoded = new
String(DatatypeConverter.parseBase64Binary(binaryXmlData));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(decoded));
Document doc = db.parse(is);
String expression = "/Invoice/cac:AccountingSupplierParty/cac:Party/cac:PartyIdentification/cbc:ID#[schemaID='schema2']/text()";
String schema2 = (String) xPath.compile(expression).evaluate(doc, XPathConstants.STRING);
System.out.println(schema2);
//schema2 is null
//Above this code block returns correct value
NodeList nl = doc.getElementsByTagName("cbc:CityName");
System.out.println(nl.item(0).getTextContent());
} catch () {
}
binaryXmlData is source of my XML. First, I convert base64binary data to xml. Am I doing to convertion wrong or my xpath implementation is wrong ?
There are many problems with your code and your XML, including:
Your XML is not well-formed. The closing quote of the cbc
namespace prefix is missing.
Your Java code never defines a NamespaceContext.
See also How does XPath deal with XML namespaces?

How to get the attribute of a nested xml with namespace in java x path

I have an complex xml with nested structure and with namespace .
I am able to read the xml elements but not able to read the attribute .
Attribute Like i have to read contentSet or action from my xml .
Here is my XML structure
<?xml version="1.0"?>
<env:ContentEnvelope xsi:schemaLocation="http://fundamental.schemas.financial.jso.com/Fundamental/2011-07-07/
https://theshare.jso.com/sites/TRM-IA/Content%20Marketplace/Strategic%20Data%20Interfaces/SDI%20Schemas/Schemas/Fundamentals/2015-09-25/FundamentalMaster.xsd"
xmlns:esg="http://fundamental.schemas.financial.jso.com/ESGSupportingInfo/2011-07-07/"
xmlns:md="http://data.schemas.financial.jso.com/metadata/2010-10-10/"
xmlns:cr="http://fundamental.schemas.financial.jso.com/CoraxData/2012-10-25/"
<env:Header>
<env:Info>
<env:Id>urn:uuid:069527ab-2c10-48bb-b3d2-206f4e66e5d2</env:Id>
<env:TimeStamp>2016-12-23T10:09:09+00:00</env:TimeStamp>
</env:Info>
<fun:OrgId>20240</fun:OrgId>
<fun:PartitionId>1</fun:PartitionId>
</env:Header>
<env:Body minVers="0.0" majVers="1" contentSet="Fundamental">
<env:ContentItem action="Insert">
<env:Data xsi:type="fun:FundamentalDataItem">
<fun:Fundamental effectiveTo="9999-12-31T00:00:00+00:00" effectiveFrom="2013-06-29T00:55:15.313+00:00" uniqueFuamentalSet="0054341342">
<fun:OrganizationId objectType="Organization" objectTypeId="404510">42565596</fun:OrganizationId>
<fun:PrimaryReportingEntityCode>A4C67</fun:PrimaryReportingEntityCode>
<fun:TotalPrimaryReportingShares>567923000.00000</fun:TotalPrimaryReportingShares>
<fun:LocalLanguageId>505074</fun:LocalLanguageId>
<fun:IndustryGroups>
<fun:IndustryGroup validTo="9999-12-31T00:00:00+00:00" validFrom="1900-01-01T00:00:00+00:00">
<fun:GroupCode>BNK</fun:GroupCode>
<fun:GroupName languageId="505074">Bank</fun:GroupName>
<fun:TaxonomyId>1</fun:TaxonomyId>
<fun:IndustryGroupCodeId>3011649</fun:IndustryGroupCodeId>
</fun:IndustryGroup>
</fun:IndustryGroups>
<fun:GaapCode>CAG</fun:GaapCode>
<fun:ConsolidationBasis>Consolidated</fun:ConsolidationBasis>
<fun:IsFiling>true</fun:IsFiling>
<fun:ConsolidationBasisId>3013598</fun:ConsolidationBasisId>
<fun:GaapCodeId>3011536</fun:GaapCodeId>
<fun:Taxonomies>
<fun:Taxonomy>1</fun:Taxonomy>
</fun:Taxonomies>
<fun:WorldScopeIds>
<fun:WorldScopeId validTo="9999-12-31T00:00:00+00:00" validFrom="2012-03-31T00:00:00+00:00">C12436390</fun:WorldScopeId>
</fun:WorldScopeIds>
</fun:Fundamental>
</env:Data>
</env:ContentItem>
Here is my java sxpression to read that .
FileInputStream file = new FileInputStream(new File("c://temp/Fun.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
System.out.println("*************************");
String expression = "/ContentEnvelope/Body[#minVers='0.0']/contentSet";
System.out.println(expression);
Use #attribute_name syntax to reference attribute in XPath, just like you did with #minVers :
/ContentEnvelope/Body[#minVers='0.0']/#contentSet

Why getting null node value while parsing XML

While parsing the below XML .First url-malformed-exception was coming while parsing so in the code instead of giving the xml String i used this code
Document doc=dBuilder.parse(newInputSource(newByteArrayInputStream(xmlResponse.getBytes("utf-8"))));
according to this link
java.net.MalformedURLException: no protocol
now i am getting the node value as null .How can i overcome this .In the code in for loop i have mentioned where the null value for node is coming
i am using following code:
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new InputSource(new ByteArrayInputStream(xmlResponse.getBytes("utf-8"))));
//read this - https://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
XPath xPath = XPathFactory.newInstance().newXPath()
String expression = "/GetMatchingProductForIdResponse/GetMatchingProductForIdResult/Products/Product"
System.out.println(expression)
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET)
System.out.println("the size will be of the node list ${nodeList.getLength()}");
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getNodeValue()+"the value coming will be "); // here i am getting value null for each node
}
} catch (Exception e) {
e.printStackTrace(System.out);
}
to parse the XML:
<?xml version="1.0"?>
<GetMatchingProductForIdResponse xmlns="http://mws.amazonservices.com/schema/Products/2011-10-01">
<GetMatchingProductForIdResult Id="H5-9OSH-9NZ7" IdType="SellerSKU" status="Success">
<Products xmlns="http://mws.amazonservices.com/schema/Products/2011-10-01" xmlns:ns2="http://mws.amazonservices.com/schema/Products/2011-10-01/default.xsd">
<Product>
<Identifiers>
<MarketplaceASIN>
<MarketplaceId>ATVPDKIKX0DER</MarketplaceId>
<ASIN>B004FQLAH2</ASIN>
</MarketplaceASIN>
</Identifiers>
<AttributeSets>
<ns2:ItemAttributes xml:lang="en-US">
<ns2:Binding>Office Product</ns2:Binding>
<ns2:Brand>Konica-Minolta</ns2:Brand>
<ns2:Color>Y</ns2:Color>
<ns2:CPUSpeed Units="MHz">200</ns2:CPUSpeed>
<ns2:Department>Printers</ns2:Department>
<ns2:Feature>Amp Up your Output - The magicolor 3730DN business color laser printer outputs at speeds up to 25 ppm in both color and B&W which means you can keep up in just about any business environment.</ns2:Feature>
<ns2:Feature>Unparalleled Image Quality - High resolution 2400 (equivalent) x 600 dpi printing for great color and clarity in both images and text.</ns2:Feature>
<ns2:Feature>Happy Planet, Outstanding Printing - Simitri HD Toner with Biomass allows for outstanding printing with the environment in mind.</ns2:Feature>
<ns2:Feature>Connect quicker - Why wait? Standard Ethernet and high-speed USB 2.0 gets you connected faster than ever before.Specifications</ns2:Feature>
<ns2:Feature>Type - Full-Color Laser Printer</ns2:Feature>
<ns2:ItemDimensions>
<ns2:Height Units="inches">13.62</ns2:Height>
<ns2:Length Units="inches">20.47</ns2:Length>
<ns2:Width Units="inches">16.50</ns2:Width>
<ns2:Weight Units="pounds">56.22</ns2:Weight>
</ns2:ItemDimensions>
<ns2:IsAutographed>false</ns2:IsAutographed>
<ns2:IsMemorabilia>false</ns2:IsMemorabilia>
<ns2:Label>Konica</ns2:Label>
<ns2:ListPrice>
<ns2:Amount>449.00</ns2:Amount>
<ns2:CurrencyCode>USD</ns2:CurrencyCode>
</ns2:ListPrice>
<ns2:Manufacturer>Konica</ns2:Manufacturer>
<ns2:Model>A0VD017</ns2:Model>
<ns2:NumberOfItems>1</ns2:NumberOfItems>
<ns2:OperatingSystem>Windows XP, Vista, 7</ns2:OperatingSystem>
<ns2:OperatingSystem>Mac X 10.2.8, 10.6+</ns2:OperatingSystem>
<ns2:PackageDimensions>
<ns2:Height Units="inches">19.00</ns2:Height>
<ns2:Length Units="inches">24.20</ns2:Length>
<ns2:Width Units="inches">22.00</ns2:Width>
<ns2:Weight Units="pounds">65.30</ns2:Weight>
</ns2:PackageDimensions>
<ns2:PackageQuantity>1</ns2:PackageQuantity>
<ns2:PartNumber>A0VD017</ns2:PartNumber>
<ns2:ProductGroup>CE</ns2:ProductGroup>
<ns2:ProductTypeName>PRINTER</ns2:ProductTypeName>
<ns2:Publisher>Konica</ns2:Publisher>
<ns2:SmallImage>
<ns2:URL>http://ecx.images-amazon.com/images/I/21qN3BU-BHL._SL75_.jpg</ns2:URL>
<ns2:Height Units="pixels">75</ns2:Height>
<ns2:Width Units="pixels">75</ns2:Width>
</ns2:SmallImage>
<ns2:Studio>Konica</ns2:Studio>
<ns2:Title>Konica Minolta Magicolor 3730DN Color Laser Printer 24PPM 2400X600DPI ENET USB 2.0</ns2:Title>
</ns2:ItemAttributes>
</AttributeSets>
<Relationships/>
<SalesRankings/>
</Product>
</Products>
</GetMatchingProductForIdResult>
<ResponseMetadata>
<RequestId>0b508338-3afe-4178-adc4-60c9c8448987</RequestId>
</ResponseMetadata>
</GetMatchingProductForIdResponse>
The getNodeValue method in the DOM is defined to always return null for element nodes (see the table at the top of the JavaDoc page for org.w3c.dom.Node for details). If you want the text inside the element then you should use getTextContent() instead.
You've added a second question in a comment to this answer asking how you can use an XPath to search for nodes that have a namespace prefix such as ns2:. The way XPath 1.0 handles namespaces is that unprefixed names always refer to nodes that are not in a namespace, and if you want to reference namespaced nodes then you have to provide a binding of namespace URIs to prefixes (which in javax.xml.xpath is the job of a NamespaceContext) and then use those prefixes in the expressions. The prefixes you use in the expression need not be the same ones as the original document used, as long as they bind to the right URIs.
Thus the original XPath you were using:
/GetMatchingProductForIdResponse/GetMatchingProductForIdResult/Products/Product
should not actually have matched anything, because the GetMatchingProductForIdResponse etc. elements in your document are in a namespace, but you got away with it because DocumentBuilderFactory is by default not namespace aware. The correct thing to do here is to use a namespace-aware parser, and provide a suitable namespace context to the XPath engine. There's no default implementation of NamespaceContext available in the core Java library, unfortunately, but Spring provides a convenient SimpleNamespaceContext implementation you can use if you don't want to roll your own.
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true); // parse with namespaces
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new InputSource(new ByteArrayInputStream(xmlResponse.getBytes("utf-8"))));
doc.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
xPath.setNamespaceContext(nsCtx);
nsCtx.bindNamespaceUri("prod", "http://mws.amazonservices.com/schema/Products/2011-10-01");
nsCtx.bindNamespaceUri("ns2", "http://mws.amazonservices.com/schema/Products/2011-10-01/default.xsd");
String expression = "/prod:GetMatchingProductForIdResponse/prod:GetMatchingProductForIdResult/prod:Products/prod:Product‌​/prod:AttributeSets/ns2:ItemAttributes/ns2:Binding";
// ...

Categories

Resources