Convert an org.w3c.dom.Node into a String

Convert an org.w3c.dom.Node into a String - java

Sorry I'm a Java/XML newbie - and can't seem to figure this one out. It seems it's possible to convert a Document object to a string. However, I want to convert a Node object into a string. I am using org.ccil.cowan.tagsoup Parser for my purpose.
I'm retrieving the Node by something like...
parser = new org.ccil.cowan.tagsoup.Parser()
parser.setFeature(namespaceaware, false)
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMResult domResult = new DOMResult();
transformer.transform(new SAXSource(parser, new InputSource(in)), domResult);
Node n = domResult.getNode();
// I'm interested in the first child, so...
Node myNode = n.getChildNodes().item(0);
// convert myNode to string..
// what to do here?
The answer may be obvious, but I can't seem to figure out from the core Java libraries how to achieve this. Any help is much appreciated!

You can use a Transformer (error handling and optional factory configuration omitted for clarity):
Node node = ...;
StringWriter writer = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), new StreamResult(writer));
String xml = writer.toString();
// Use xml ...

String getNodeString(Node node) {
try {
StringWriter writer = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(node), new StreamResult(writer));
String output = writer.toString();
return output.substring(output.indexOf("?>") + 2);//remove <?xml version="1.0" encoding="UTF-8"?>
} catch (TransformerException e) {
e.printStackTrace();
}
return node.getTextContent();
}

This is way to convert Node to html
public static String getInnerHTML(Node node) throws TransformerConfigurationException, TransformerException
{
StringWriter sw = new StringWriter();
Result result = new StreamResult(sw);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer proc = factory.newTransformer();
proc.setOutputProperty(OutputKeys.METHOD, "html");
for (int i = 0; i < node.getChildNodes().getLength(); i++)
{
proc.transform(new DOMSource(node.getChildNodes().item(i)), result);
}
return sw.toString();
}

Related

Indent xml read from httpservlet request body?

I am trying to read xml string from HttpServletRequest like this:
private String getPayload(HttpServletRequest httRequest, String contentType){
StringBuilder buffer = new StringBuilder();
String data = null, payload = null;
BufferedReader bufferedReader = null;
try {
bufferedReader = httRequest.getReader();
String line;
while ((line = bufferedReader.readLine()) != null) {
buffer.append(line);
}
data = buffer.toString();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Payload data: " + data);
}
This gives me a string like:
<?xml version="1.0" encoding="UTF-8"?><sgn> <nev> <rep> <cin rn="cin227"> <ty>4</ty> <ri>R241</ri> <pi>R7</pi> <ct>2016-10-27T18:23:49</ct> <lt>2016-10-27T18:23:49</lt> <et>2016-11-27T18:23:49</et> <at>erer</at> <aa>name</aa> <st>17</st> <cs>88</cs> <con>Sid-Container-Element</con> </cin> </rep> <net>3</net> </nev> <sur>R7/R8</sur> <cr>C1</cr></sgn>
So it's not indented. I tried a couple of things to indent it, like:
XPathFactory xpathFactory = XPathFactory.newInstance();
// XPath to find empty text nodes.
XPathExpression xpathExp = xpathFactory.newXPath().compile(
"//text()[normalize-space(.) = '']");
NodeList emptyTextNodes = (NodeList)
xpathExp.evaluate(doc, XPathConstants.NODESET);
// Remove each empty text node from document.
for (int i = 0; i < emptyTextNodes.getLength(); i++) {
Node emptyTextNode = emptyTextNodes.item(i);
emptyTextNode.getParentNode().removeChild(emptyTextNode);
}
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
//initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);
So first removing empty spaces using xpath and then applying transformer. So this still not giving me proper output:
<?xml version="1.0" encoding="UTF-8"?><sgn >
<nev>
<rep>
<cin rn="cin227">
<ty>4</ty>
<ri>R241</ri>
<pi>R7</pi>
<ct>2016-10-27T18:23:49</ct>
<lt>2016-10-27T18:23:49</lt>
<et>2016-11-27T18:23:49</et>
<at>erer</at>
<aa>name</aa>
<st>17</st>
<cs>88</cs>
<con>Sid-Container-Element</con>
</cin>
</rep>
<net>3</net>
</nev>
<sur>R7/R8</sur>
<cr>C1</cr>
</sgn>
So it's not properly indenting it. I want proper indentation.
Anything I am doing wrong here ??

dom4j - xpath - get xml as string for Document.valueOf

Is there a way to write XPath expression that will be returning xml tree as string when called with Document.valueOf method from dom4j library? I tried /node(), but it returns just texts in all nodes, not including tags.

Method 1 :
String result = document.toXML().toString();
Method 2 : (a more elegant aproach)
DOMSource domSource = new DOMSource(document);
StringWriter writer = new StringWriter();
StreamResult streamResult = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory .newTransformer();
transformer.transform(domSource, streamResult);
String result = writer.toString();

Pretty print XML in java 8

I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me. I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not found error.
Here is my code which currently just outputs each xml element on a new line to the left of the console.
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Test {
public Test(){
try {
//java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");
DocumentBuilderFactory dbFactory;
DocumentBuilder dBuilder;
Document original = null;
try {
dbFactory = DocumentBuilderFactory.newInstance();
dBuilder = dbFactory.newDocumentBuilder();
original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
} catch (SAXException | IOException | ParserConfigurationException e) {
e.printStackTrace();
}
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory tf = TransformerFactory.newInstance();
//tf.setAttribute("indent-number", 2);
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(original), xmlOutput);
java.lang.System.out.println(xmlOutput.getWriter().toString());
} catch (Exception ex) {
throw new RuntimeException("Error converting to String", ex);
}
}
public static void main(String[] args){
new Test();
}
}

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".
Background
Excerpt from the article (see References below) inspiring this solution:
Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.
Java Code
public static String toPrettyString(String xml, int indent) {
try {
// Turn xml string into a document
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
// Remove whitespaces outside tags
document.normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
document,
XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}
// Setup pretty print options
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
// Return pretty print xml string
StringWriter stringWriter = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
return stringWriter.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Sample usage
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(toPrettyString(xml, 4));
Output
<root>
<name>Coco Puff</name>
<total>10</total>
</root>
References
Java: Properly Indenting XML String published on MyShittyCode
Save new XML node to file

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.
original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);
for (int i = 0; i < blankTextNodes.getLength(); i++) {
blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}

This works on Java 8:
public static void main (String[] args) throws Exception {
String xmlString = "<hello><from>ME</from></hello>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
pretty(document, System.out, 2);
}
private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
if (indent > 0) {
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
}
Result result = new StreamResult(outputStream);
Source source = new DOMSource(document);
transformer.transform(source, result);
}

I've written a simple class for for removing whitespace in documents - supports command-line and does not use DOM / XPath.
Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

Underscore-java has static method U.formatXml(string). I am the maintainer of the project. Live example
import com.github.underscore.U;
public class MyClass {
public static void main(String args[]) {
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(U.formatXml(xml));
}
}
Output:
<root>
<name>Coco Puff</name>
<total>10</total>
</root>

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:
public String GenerateTabs(int tabLevel) {
char[] tabs = new char[tabLevel * 2];
Arrays.fill(tabs, ' ');
//Or:
//char[] tabs = new char[tabLevel];
//Arrays.fill(tabs, '\t');
return new String(tabs);
}
public String FormatXHTMLCode(String code) {
// Split on new lines.
String[] splitLines = code.split("\\n", 0);
int tabLevel = 0;
// Go through each line.
for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
String currentLine = splitLines[lineNum];
if (currentLine.trim().isEmpty()) {
splitLines[lineNum] = "";
} else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
++tabLevel;
} else if (currentLine.matches(".*</[^<>]+?>")) {
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
} else if (currentLine.matches("[^<>]*?/>")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
} else {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
}
}
return String.join("\n", splitLines);
}
It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

Create xml file :
new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect !
so that, when parse the content of the given input source as an XML document
and return a new DOM object.
Document original = null;
...
original.parse("data.xml");//input source as an XML document

Retain escape character [" < etc..] while copying XML node - Java

I am creating target XML by copying source XML content. I am doing copy at node level.
Source XML has content with escape character which gets converted [$quot; to " etc...] while I create my target XML
Is there any way to retain original XML content.
Appreciate any help on this.
copyXmlFile("Workflow", "./Source.xml", "./Destination.xml");
private static void copyXmlFile(String xmlType, String objectSourceFile, String outfile) throws TransformerException {
//Get the DOM Builder Factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
//Get the DOM Builder
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
//document contains the complete XML as a Tree.
try {
File xmlFileContent = new File(objectSourceFile);
Document document = builder.parse(new FileInputStream(xmlFileContent));
// root elements
Document documentOut = builder.newDocument();
Element rootElementOut = documentOut.createElement(xmlType);
rootElementOut.setAttribute("xmlns", "http://soap.sforce.com/2006/04/metadata");
documentOut.appendChild(rootElementOut);
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
//Node copiedNode = documentOut.importNode(node, true);
//rootElementOut.appendChild(copiedNode);
rootElementOut.appendChild(documentOut.adoptNode(node.cloneNode(true)));
}
}
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(documentOut);
//StreamResult result = new StreamResult(new File(outfile));
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
//transformer.setOutputProperty(OutputKeys.METHOD, "xml");
//transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(source, result);
System.out.println("Escaped XML String in Java: " + writer.toString());
} catch (SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} }

Need of transformer function

I am linking to a webservice all works fine. Below is my codes.
private static void printSOAPResponse(SOAPMessage soapResponse) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
Source sourceContent = soapResponse.getSOAPPart().getContent();
System.out.print("\nResponse SOAP Message = \n");
StreamResult result = new StreamResult(System.out);
transformer.transform(sourceContent, result);
}
I am not too sure of the transformerfactory. What I need to do now is to traverse through the results and look for below tag.
<Table diffgr:id="Table1" > and there after there will be few tags in it for e.g.
<rID>1212</rID>
<sNo>15677</sNo>
So what is the best way as some require to covert it into string is that necessary?

Transform to document (unchecked):
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
DOMResult result = new DOMResult();
transformer.transform(sourceContent, result);
Document doc = (Document) result.getNode();
Find in document:
String tag = "Table";
String attr = "diffgr:id";
String attrValue = "Table1";
NodeList list = doc.getElementsByTagName("Table");
Element tableNode = null;
for (int i = 0; i < list.getLength(); i++) {
tableNode = ((Element) list.item(i));
String currentAttrValue = tableNode.getAttribute(attr);
if (attrValue.equals(currentAttrValue)) {
break;
}
}
String childTag1 = "rID";
String childTag2 = "sNo";
Node child1 = (Node) tableNode.getElementsByTagName(childTag1).item(0);
Node child2 = (Node) tableNode.getElementsByTagName(childTag2).item(0);
String rIDValue = child1.getTextContent();
String sNoValue = child1.getTextContent();

You code Transformer transformer = transformerFactory.newTransformer(); is creating an "identity transformer" which copies the input unchanged, so it's not really doing anything useful. What you want here is a real (XSLT) transformer which actually extracts the information you need: something like
<xsl:template match="/">
<xsl:copy-of select="//Table[#diffgr:id='Table1']"/>
</xsl:template>
which you can compile using transformerFactory.newTemplates().

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert an org.w3c.dom.Node into a String - java

Related

Indent xml read from httpservlet request body?

dom4j - xpath - get xml as string for Document.valueOf

Pretty print XML in java 8

Retain escape character [" < etc..] while copying XML node - Java

Need of transformer function

Categories

Resources