Get full xml text from Node instance

Get full xml text from Node instance - java

I have read XML file in Java with such code:
File file = new File("file.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
NodeList nodeLst = doc.getElementsByTagName("record");
for (int i = 0; i < nodeLst.getLength(); i++) {
Node node = nodeLst.item(i);
...
}
So, how I can get full xml content from node instance? (including all tags, attributes etc.)
Thanks.

Check out this other answer from stackoverflow.
You would use a DOMSource (instead of the StreamSource), and pass your node in the constructor.
Then you can transform the node into a String.
Quick sample:
public class NodeToString {
public static void main(String[] args) throws TransformerException, ParserConfigurationException, SAXException, IOException {
// just to get access to a Node
String fakeXml = "<!-- Document comment -->\n <aaa>\n\n<bbb/> \n<ccc/></aaa>";
DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(fakeXml)));
Node node = doc.getDocumentElement();
// test the method
System.out.println(node2String(node));
}
static String node2String(Node node) throws TransformerFactoryConfigurationError, TransformerException {
// you may prefer to use single instances of Transformer, and
// StringWriter rather than create each time. That would be up to your
// judgement and whether your app is single threaded etc
StreamResult xmlOutput = new StreamResult(new StringWriter());
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(node), xmlOutput);
return xmlOutput.getWriter().toString();
}
}

Related

Parsing HTML content from XML file

<xbrli:xbrl xmlns:aoi="http://www.aointl.com/20160331" xmlns:country="http://xbrl.sec.gov/country/2016-01-31" xmlns:currency="http://xbrl.sec.gov/currency/2016-01-31" xmlns:dei="http://xbrl.sec.gov/dei/2014-01-31" xmlns:exch="http://xbrl.sec.gov/exch/2016-01-31" xmlns:invest="http://xbrl.sec.gov/invest/2013-01-31" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:naics="http://xbrl.sec.gov/naics/2011-01-31" xmlns:nonnum="http://www.xbrl.org/dtr/type/non-numeric" xmlns:num="http://www.xbrl.org/dtr/type/numeric" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:sic="http://xbrl.sec.gov/sic/2011-01-31" xmlns:stpr="http://xbrl.sec.gov/stpr/2011-01-31" xmlns:us-gaap="http://fasb.org/us-gaap/2016-01-31" xmlns:us-roles="http://fasb.org/us-roles/2016-01-31" xmlns:us-types="http://fasb.org/us-types/2016-01-31" xmlns:utreg="http://www.xbrl.org/2009/utr" xmlns:xbrldi="http://xbrl.org/2006/xbrldi" xmlns:xbrldt="http://xbrl.org/2005/xbrldt" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<link:schemaRef xlink:href="aoi-20160331.xsd" xlink:type="simple"/>
<xbrli:context id="FD2016Q4YTD">
<xbrli:entity>
<xbrli:identifier scheme="http://www.sec.gov/CIK">0000939930</xbrli:identifier>
</xbrli:entity>
<xbrli:period>
<xbrli:startDate>2015-04-01</xbrli:startDate>
<xbrli:endDate>2016-03-31</xbrli:endDate>
</xbrli:period>
</xbrli:context>
<aoi:OtherIncomeAndExpensePolicyTextBlock contextRef="FD2016Q4YTD" id="Fact-F51C7616E17E5B8B0B770D410BBF5A3E">
<div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
</aoi:OtherIncomeAndExpensePolicyTextBlock>
</xbrli:xbrl>
This is My XML[XBRL], i need to parse this. This xml is my input and i don't know whether its a valid or not but in need output like this :
<div style="font-family:Times New Roman;font-size:10pt;"><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;font-weight:bold;">Other Income (Expense)</font></div><div style="line-height:120%;text-align:justify;font-size:10pt;"><font style="font-family:inherit;font-size:10pt;"></font></div></div>
Please someone share me the knowledge for this problem i am facing from last two weeks.
this is the code i am using
File fXmlFile = new File("/home/devteam-user1/Desktop/ky/UnitTesting.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
XPath xPath = XPathFactory.newInstance().newXPath();
final String DIV_UNDER_ROOT = "/*/aoi";
NodeList divList = (NodeList)xPath.compile(DIV_UNDER_ROOT)
.evaluate(doc, XPathConstants.NODESET);
System.out.println(divList.getLength());
for (int i = 0; i < divList.getLength() ; i++) { // just in case there is more than one
Node divNode = divList.item(i);
System.out.println(nodeToString(divNode));
//nodeToString method below
private static String nodeToString(Node node) throws Exception
{
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
transformer.transform(new DOMSource(node), result);
return result.getWriter().toString();
}

this works well for me
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream("yourfile.xml");
Document doc = Jsoup.parse(Utils.streamToString(fis));
System.out.println(doc.select("aoi|OtherIncomeAndExpensePolicyTextBlock").html().toString());
}

Your main issue lies with
final String DIV_UNDER_ROOT = "/*/aoi";
Which is an XPath expression that matches "any node 2 levels under the root, which has a local name of aoi and no namespace". This is not what you want.
You want to match any contents of a node that is two levels deep, whose namespace is aliased by "aoi" (which means it belongs to the "http://www.aointl.com/20160331" namespace), and whose local name is "OtherIncomeAndExpensePolicyTextBlock".
Matching namespaces in XPath in Java is quiet cumbersome (see XPath with namespace in Java and How to query XML using namespaces in Java with XPath?), but long story short, you could try this way instead :
final String DIV_UNDER_ROOT = "//*[local-name()='OtherIncomeAndExpensePolicyTextBlock' and namespace-uri()='http://www.aointl.com/20160331']/*";
This will only work if your DocumentBuilderFactory is made namespace aware, so you should make sure by configuring it like so above :
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);

Modifying an XML in Java

I have an XML document which has null values in its Value tag (something similar to below)
<ns2:Attribute Name="Store" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri">
<ns2:AttributeValue/>
</ns2:Attribute>
Now, I have to write Java code to modify the values as below.
<ns2:Attribute Name="Store" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri">
<ns2:AttributeValue>ABCDEF</ns2:AttributeValue>
</ns2:Attribute>
I am using DOM parser to parse the XML. I am able to delete the null tag for " but not able to add new values. I am not even sure if we have a direct way to replace or add the values.
Below is the code I am using to remove the child("")
Node aNode = nodeList.item(i);
eElement = (Element) aNode;
eElement.removeChild(eElement.getFirstChild().getNextSibling());
Thanks in advance

Just add data through setTextContent to the element. Sample code is as below:
public static void main(String[] args) throws IOException, SAXException, ParserConfigurationException, TransformerException {
String xml = "<ns2:Attribute Name=\"Store\" NameFormat=\"urn:oasis:names:tc:SAML:2.0:attrname-format:uri\"><ns2:AttributeValue/></ns2:Attribute>";
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("ns2:AttributeValue");
for (int i=0;i<nList.getLength();i++) {
Element elem = (Element)nList.item(i);
elem.setTextContent("Content"+i);
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
System.out.println(nList.getLength());
}

updated xml data not updated in the xml file

i have made a method for updating my xml in the xml file by a using a GUI..
but when I update it everything seem to be working fine and the console is printing out the correct things.
But when I open the xml file and press refrah nothing is updated.
What is my problem?
public void updateObjType(String newTxt, int x) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
System.out.println("String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse("xmlFiles/CoreDatamodel.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
// Go thru the Object_types in the XML file and get item x.
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
String value = nodeList.item(x).getTextContent();
System.out.println(value);
}
this is the output from the console :
Original data : IF150Data
Incoming String value : Data
Index value : 4
updated data : Data

I solved it by using a transformer.
Full solution :
// Update the object type name from the object type list.
public void updateObjType(String newTxt, int x)
throws ParserConfigurationException, SAXException, IOException,
XPathExpressionException {
File file = new File("xmlFiles/CoreDatamodel.xml");
System.out.println("Incoming String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
// Save the new updates
try {
save(file, xmlDocument);
} catch (Exception e) {
e.printStackTrace();
}
}
And then the method I added :
public void save(File file, Document doc) throws Exception {
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String s = writer.toString();
System.out.println(s);
FileWriter fileWriter = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write(s);
bufferedWriter.flush();
bufferedWriter.close();
}

How to retrieve XML including tags using the DOM parser

I am using org.w3c.dom to parse an XML file. Then I need to return the ENTIRE XML for a specific node including the tags, not just the values of the tags. I'm using the NodeList because I need to count how many records are in the file. But I also need to read the file wholesale from the beginning and then write it out to a new XML file. But my current code only prints the value of the node, but not the node itself. I'm stumped.
public static void main(String[] args) {
try {
File fXmlFile = new File (args[0]);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList listOfRecords = doc.getElementsByTagName("record");
int totalRecords = listOfRecords.getLength();
System.out.println("Total number of records : " + totalRecords);
int amountToSplice = queryUser();
for (int i = 0; i < amountToSplice; i++) {
String stringNode = listOfRecords.item(i).getTextContent();
System.out.println(stringNode);
}
} catch (Exception e) {
e.printStackTrace();
}
}

getTextContent() will only "return the text content of this node and its descendants" i.e. you only get the content of the 'text' type nodes. When parsing XML it's good to remember there are several different types of node, see XML DOM Node Types.
To do what you want, you could create a utility method like this...
public static String nodeToString(Node node)
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
t.transform(new DOMSource(node), new StreamResult(sw));
return sw.toString();
}
Then loop and print like this...
for (int i = 0; i < amountToSplice; i++)
System.out.println(nodeToString(listOfRecords.item(i)));

delete the unwanted strings before and after the string in xml file

xml file content
<distributionChannels><distributionChannel type="Wap" id="1"><contentChannelRefs>
<contentChannelRef id="2"><categories><category
link="http://images/11.gif" id="1"><names><name lang="de">Top Downloads</name><name
lang="ww">Tops</name></names></category></categories></contentChannelRef>
</contentChannelRefs></distributionChannel>
</distributionChannels>
how do i delete the unwanted content which i am reading from an xml file and the output should look as shown below:
<category link="http://images/11.gif" id="1"><names><name lang="de">Top Downloads</name><name lang="ww">Tops</name></names></category>

Reliable solution - use an XML parser. Simple solution is
s = s.substring(s.indexOf("<categories>"), s.indexOf("</categories>") + 13);
if you want to read categories one by one use regex
Matcher m = Pattern.compile("<category.*?>.*?</category>").matcher(xml);
for(int i = 0; m.find(); i++) {
System.out.println(m.group());
}

Pattern matching with XML is not recommended. Use a parser to get your nodes and the manage them accordingly. If you are interested in printing them I have included code to print the nodes.
public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(s)));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile("//categories//category");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
//This is where you are printing things. You can handle differently if
//you would like.
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodeToString(nodes.item(i)));
}
}
private static String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
te.printStackTrace();
}
return sw.toString();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get full xml text from Node instance - java

Related

Parsing HTML content from XML file

Modifying an XML in Java

updated xml data not updated in the xml file

How to retrieve XML including tags using the DOM parser

delete the unwanted strings before and after the string in xml file

Categories

Resources