Parsing HTML content from XML file

Parsing HTML content from XML file - java

<xbrli:xbrl xmlns:aoi="http://www.aointl.com/20160331" xmlns:country="http://xbrl.sec.gov/country/2016-01-31" xmlns:currency="http://xbrl.sec.gov/currency/2016-01-31" xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31" xmlns:exch="http://xbrl.sec.gov/exch/2016-01-31" xmlns:invest="http://xbrl.sec.gov/invest/2013-01-31" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:naics="http://xbrl.sec.gov/naics/2011-01-31" xmlns:nonnum="http://www.xbrl.org/dtr/type/non-numeric" xmlns:num="http://www.xbrl.org/dtr/type/numeric" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:sic="http://xbrl.sec.gov/sic/2011-01-31" xmlns:stpr="http://xbrl.sec.gov/stpr/2011-01-31" xmlns:us-gaap="http://fasb.org/us-gaap/2016-01-31" xmlns:us-roles="http://fasb.org/us-roles/2016-01-31" xmlns:us-types="http://fasb.org/us-types/2016-01-31" xmlns:utreg="http://www.xbrl.org/2009/utr" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xbrldt="http://xbrl.org/2005/xbrldt" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<link:schemaRef xlink:href="aoi-20160331.xsd" xlink:type="simple"/>
<xbrli:context id="FD2016Q4YTD">
<xbrli:entity>
<xbrli:identifier scheme="http://www.sec.gov/CIK">0000939930</xbrli:identifier>
</xbrli:entity>
<xbrli:period>
<xbrli:startDate>2015-04-01</xbrli:startDate>
<xbrli:endDate>2016-03-31</xbrli:endDate>
</xbrli:period>
</xbrli:context>
<aoi:OtherIncomeAndExpensePolicyTextBlock contextRef="FD2016Q4YTD" id="Fact-F51C7616E17E5B8B0B770D410BBF5A3E">
<div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
</aoi:OtherIncomeAndExpensePolicyTextBlock>
</xbrli:xbrl>
This is My XML[XBRL], i need to parse this. This xml is my input and i don't know whether its a valid or not but in need output like this :
<div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
Please someone share me the knowledge for this problem i am facing from last two weeks.
this is the code i am using
File fXmlFile = new File("/home/devteam-user1/Desktop/ky/UnitTesting.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
XPath xPath = XPathFactory.newInstance().newXPath();
final String DIV_UNDER_ROOT = "/*/aoi";
NodeList divList = (NodeList)xPath.compile(DIV_UNDER_ROOT)
.evaluate(doc, XPathConstants.NODESET);
System.out.println(divList.getLength());
for (int i = 0; i < divList.getLength() ; i++) { // just in case there is more than one
Node divNode = divList.item(i);
System.out.println(nodeToString(divNode));
//nodeToString method below
private static String nodeToString(Node node) throws Exception
{
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
transformer.transform(new DOMSource(node), result);
return result.getWriter().toString();
}

this works well for me
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream("yourfile.xml");
Document doc = Jsoup.parse(Utils.streamToString(fis));
System.out.println(doc.select("aoi|OtherIncomeAndExpensePolicyTextBlock").html().toString());
}

Your main issue lies with
final String DIV_UNDER_ROOT = "/*/aoi";
Which is an XPath expression that matches "any node 2 levels under the root, which has a local name of aoi and no namespace". This is not what you want.
You want to match any contents of a node that is two levels deep, whose namespace is aliased by "aoi" (which means it belongs to the "http://www.aointl.com/20160331" namespace), and whose local name is "OtherIncomeAndExpensePolicyTextBlock".
Matching namespaces in XPath in Java is quiet cumbersome (see XPath with namespace in Java and How to query XML using namespaces in Java with XPath?), but long story short, you could try this way instead :
final String DIV_UNDER_ROOT = "//*[local-name()='OtherIncomeAndExpensePolicyTextBlock' and namespace-uri()='http://www.aointl.com/20160331']/*";
This will only work if your DocumentBuilderFactory is made namespace aware, so you should make sure by configuring it like so above :
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);

Related

Modifying an XML in Java

I have an XML document which has null values in its Value tag (something similar to below)
<ns2:Attribute Name="Store" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri">
<ns2:AttributeValue/>
</ns2:Attribute>
Now, I have to write Java code to modify the values as below.
<ns2:Attribute Name="Store" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri">
<ns2:AttributeValue>ABCDEF</ns2:AttributeValue>
</ns2:Attribute>
I am using DOM parser to parse the XML. I am able to delete the null tag for " but not able to add new values. I am not even sure if we have a direct way to replace or add the values.
Below is the code I am using to remove the child("")
Node aNode = nodeList.item(i);
eElement = (Element) aNode;
eElement.removeChild(eElement.getFirstChild().getNextSibling());
Thanks in advance

Just add data through setTextContent to the element. Sample code is as below:
public static void main(String[] args) throws IOException, SAXException, ParserConfigurationException, TransformerException {
String xml = "<ns2:Attribute Name=\"Store\" NameFormat=\"urn:oasis:names:tc:SAML:2.0:attrname-format:uri\"><ns2:AttributeValue/></ns2:Attribute>";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("ns2:AttributeValue");
for (int i=0;i<nList.getLength();i++) {
Element elem = (Element)nList.item(i);
elem.setTextContent("Content"+i);
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
System.out.println(nList.getLength());
}

How to retrieve XML including tags using the DOM parser

I am using org.w3c.dom to parse an XML file. Then I need to return the ENTIRE XML for a specific node including the tags, not just the values of the tags. I'm using the NodeList because I need to count how many records are in the file. But I also need to read the file wholesale from the beginning and then write it out to a new XML file. But my current code only prints the value of the node, but not the node itself. I'm stumped.
public static void main(String[] args) {
try {
File fXmlFile = new File (args[0]);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList listOfRecords = doc.getElementsByTagName("record");
int totalRecords = listOfRecords.getLength();
System.out.println("Total number of records : " + totalRecords);
int amountToSplice = queryUser();
for (int i = 0; i < amountToSplice; i++) {
String stringNode = listOfRecords.item(i).getTextContent();
System.out.println(stringNode);
}
} catch (Exception e) {
e.printStackTrace();
}
}

getTextContent() will only "return the text content of this node and its descendants" i.e. you only get the content of the 'text' type nodes. When parsing XML it's good to remember there are several different types of node, see XML DOM Node Types.
To do what you want, you could create a utility method like this...
public static String nodeToString(Node node)
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
t.transform(new DOMSource(node), new StreamResult(sw));
return sw.toString();
}
Then loop and print like this...
for (int i = 0; i < amountToSplice; i++)
System.out.println(nodeToString(listOfRecords.item(i)));

Getting value of child node from XML in java

My xml file looks like this
<InNetworkCostSharing>
<FamilyAnnualDeductibleAmount>
<Amount>6000</Amount>
</FamilyAnnualDeductibleAmount>
<IndividualAnnualDeductibleAmount>
<NotApplicable>Not Applicable</NotApplicable>
</IndividualAnnualDeductibleAmount>
<PCPCopayAmount>
<CoveredAmount>0</CoveredAmount>
</PCPCopayAmount>
<CoinsuranceRate>
<CoveredPercent>0</CoveredPercent>
</CoinsuranceRate>
<FamilyAnnualOOPLimitAmount>
<Amount>6000</Amount>
</FamilyAnnualOOPLimitAmount>
<IndividualAnnualOOPLimitAmount>
<NotApplicable>Not Applicable</NotApplicable>
</IndividualAnnualOOPLimitAmount>
</InNetworkCostSharing>
I am trying to get Amount value from <FamilyAnnualDeductibleAmount> and also from <FamilyAnnualOOPLimitAmount>. How do i get those values in java?

You may use two XPath queries /InNetworkCostSharing/FamilyAnnualDeductibleAmount and InNetworkCostSharing/FamilyAnnualOOPLimitAmount or just get the node InNetworkCostSharing and retrieve the values of its two direct children.
Solution using XPath:
// load the XML as String into a DOM Document object
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream("YOUR XML".getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the content of the <FamilyAnnualDeductibleAmount> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/InNetworkCostSharing/FamilyAnnualDeductibleAmount/text()");
String familyAnnualDeductibleAmount = (String)expr.evaluate(doc, XPathConstants.STRING);

StAX based solution:
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader rdr = f.createXMLStreamReader(new FileReader("test.xml"));
while (rdr.hasNext()) {
if (rdr.next() == XMLStreamConstants.START_ELEMENT) {
if (rdr.getLocalName().equals("FamilyAnnualDeductibleAmount")) {
rdr.nextTag();
int familyAnnualDeductibleAmount = Integer.parseInt(rdr.getElementText());
System.out.println("familyAnnualDeductibleAmount = " + familyAnnualDeductibleAmount);
} else if (rdr.getLocalName().equals("FamilyAnnualOOPLimitAmount")) {
rdr.nextTag();
int familyAnnualOOPLimitAmount = Integer.parseInt(rdr.getElementText());
System.out.println("FamilyAnnualOOPLimitAmount = " + familyAnnualOOPLimitAmount);
}
}
}
rdr.close();
Note that StAX is especially good for cases like yours, it skips all unnecessary elements reading only the ones you need

Try something like this(use getElementsByTagName to get the parent nodes and then get the value be reaching out to child node):
File xmlFile = new File("NetworkCost.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile );
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("FamilyAnnualDeductibleAmount");
String familyDedAmount = nList.item(0).getChildNodes().item(0).getTextContent();
nList = doc.getElementsByTagName("FamilyAnnualOOPLimitAmount");
String familyAnnualAmount =
nList.item(0).getChildNodes().item(0).getTextContent();

I think I found the solution with this question from stackoverflow
Getting XML Node text value with Java DOM

Get full xml text from Node instance

I have read XML file in Java with such code:
File file = new File("file.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
NodeList nodeLst = doc.getElementsByTagName("record");
for (int i = 0; i < nodeLst.getLength(); i++) {
Node node = nodeLst.item(i);
...
}
So, how I can get full xml content from node instance? (including all tags, attributes etc.)
Thanks.

Check out this other answer from stackoverflow.
You would use a DOMSource (instead of the StreamSource), and pass your node in the constructor.
Then you can transform the node into a String.
Quick sample:
public class NodeToString {
public static void main(String[] args) throws TransformerException, ParserConfigurationException, SAXException, IOException {
// just to get access to a Node
String fakeXml = "<!-- Document comment -->\n <aaa>\n\n<bbb/> \n<ccc/></aaa>";
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(fakeXml)));
Node node = doc.getDocumentElement();
// test the method
System.out.println(node2String(node));
}
static String node2String(Node node) throws TransformerFactoryConfigurationError, TransformerException {
// you may prefer to use single instances of Transformer, and
// StringWriter rather than create each time. That would be up to your
// judgement and whether your app is single threaded etc
StreamResult xmlOutput = new StreamResult(new StringWriter());
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(node), xmlOutput);
return xmlOutput.getWriter().toString();
}
}

getElementsByTagName doesn't work

I have next simple part of code:
String test = "<?xml version="1.0" encoding="UTF-8"?><TT_NET_Result><GUID>9145b1d3-4aa3-4797-b65f-9f5e00be1a30</GUID></TT_NET_Result>"
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(test)));
NodeList nl = doc.getDocumentElement().getElementsByTagName("TT_NET_Result");
The problem is that I don't get any result - nodelist variable "nl" is empty.
What could be wrong?

You're asking for elements under the document element, but TT_NET_Result is the document element. If you just call
NodeList nl = doc.getElementsByTagName("TT_NET_Result");
then I suspect you'll get the result you want.

Here's another response to this old question. I hit a similar issue in my code today and I actually read/write XML all the time. For some reason I overlooked one major fact. If you want to use
NodeList elements = doc.getElementsByTagNameNS(namespace,elementName);
You need to parse your document with a factory that is namespace-aware.
private static DocumentBuilderFactory getFactory() {
if (factory == null){
factory = DocumentBuilderFactory
.newInstance();
factory.setNamespaceAware(true);
}
return factory;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing HTML content from XML file - java

this works well for me public static void main(String[] args) throws IOException { FileInputStream fis = new FileInputStream("yourfile.xml"); Document doc = Jsoup.parse(Utils.streamToString(fis)); System.out.println(doc.select("aoi|OtherIncomeAndExpensePolicyTextBlock").html().toString()); }

Related

Modifying an XML in Java

How to retrieve XML including tags using the DOM parser

Getting value of child node from XML in java

Get full xml text from Node instance

getElementsByTagName doesn't work

Categories

Resources