Pretty print XML in java 8

Pretty print XML in java 8 - java

I have an XML file stored as a DOM Document and I would like to pretty print it to the console, preferably without using an external library. I am aware that this question has been asked multiple times on this site, however none of the previous answers have worked for me. I am using java 8, so perhaps this is where my code differs from previous questions? I have also tried to set the transformer manually using code found from the web, however this just caused a not found error.
Here is my code which currently just outputs each xml element on a new line to the left of the console.
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Test {
public Test(){
try {
//java.lang.System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.xsltc.trax.TransformerFactoryImpl");
DocumentBuilderFactory dbFactory;
DocumentBuilder dBuilder;
Document original = null;
try {
dbFactory = DocumentBuilderFactory.newInstance();
dBuilder = dbFactory.newDocumentBuilder();
original = dBuilder.parse(new InputSource(new InputStreamReader(new FileInputStream("xml Store - Copy.xml"))));
} catch (SAXException | IOException | ParserConfigurationException e) {
e.printStackTrace();
}
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory tf = TransformerFactory.newInstance();
//tf.setAttribute("indent-number", 2);
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(original), xmlOutput);
java.lang.System.out.println(xmlOutput.getWriter().toString());
} catch (Exception ex) {
throw new RuntimeException("Error converting to String", ex);
}
}
public static void main(String[] args){
new Test();
}
}

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".
Background
Excerpt from the article (see References below) inspiring this solution:
Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.
Java Code
public static String toPrettyString(String xml, int indent) {
try {
// Turn xml string into a document
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
// Remove whitespaces outside tags
document.normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
document,
XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}
// Setup pretty print options
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", indent);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
// Return pretty print xml string
StringWriter stringWriter = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
return stringWriter.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Sample usage
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(toPrettyString(xml, 4));
Output
<root>
<name>Coco Puff</name>
<total>10</total>
</root>
References
Java: Properly Indenting XML String published on MyShittyCode
Save new XML node to file

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.
original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);
for (int i = 0; i < blankTextNodes.getLength(); i++) {
blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}

This works on Java 8:
public static void main (String[] args) throws Exception {
String xmlString = "<hello><from>ME</from></hello>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
pretty(document, System.out, 2);
}
private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
if (indent > 0) {
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
}
Result result = new StreamResult(outputStream);
Source source = new DOMSource(document);
transformer.transform(source, result);
}

I've written a simple class for for removing whitespace in documents - supports command-line and does not use DOM / XPath.
Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:
PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

Underscore-java has static method U.formatXml(string). I am the maintainer of the project. Live example
import com.github.underscore.U;
public class MyClass {
public static void main(String args[]) {
String xml = "<root>" + //
"\n " + //
"\n<name>Coco Puff</name>" + //
"\n <total>10</total> </root>";
System.out.println(U.formatXml(xml));
}
}
Output:
<root>
<name>Coco Puff</name>
<total>10</total>
</root>

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:
public String GenerateTabs(int tabLevel) {
char[] tabs = new char[tabLevel * 2];
Arrays.fill(tabs, ' ');
//Or:
//char[] tabs = new char[tabLevel];
//Arrays.fill(tabs, '\t');
return new String(tabs);
}
public String FormatXHTMLCode(String code) {
// Split on new lines.
String[] splitLines = code.split("\\n", 0);
int tabLevel = 0;
// Go through each line.
for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
String currentLine = splitLines[lineNum];
if (currentLine.trim().isEmpty()) {
splitLines[lineNum] = "";
} else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
++tabLevel;
} else if (currentLine.matches(".*</[^<>]+?>")) {
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
} else if (currentLine.matches("[^<>]*?/>")) {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
--tabLevel;
if (tabLevel < 0) {
tabLevel = 0;
}
} else {
splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
}
}
return String.join("\n", splitLines);
}
It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

Create xml file :
new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect !
so that, when parse the content of the given input source as an XML document
and return a new DOM object.
Document original = null;
...
original.parse("data.xml");//input source as an XML document

Related

Include delimiter when performing substring operation

How do I include the delimiter when performing a substring operation?
i.e. given the string message which looks like this:
<nutrition>
<daily-values>
<total-fat units="g">65</total-fat>
<saturated-fat units="g">20</saturated-fat>
<cholesterol units="mg">300</cholesterol>
<sodium units="mg">2400</sodium>
<carb units="g">300</carb>
<fiber units="g">25</fiber>
<protein units="g">50</protein>
</daily-values>
</nutrition>
<food>
<name>Avocado Dip</name>
<mfr>Sunnydale</mfr>
<serving units="g">29</serving>
<calories total="110" fat="100"/>
<total-fat>11</total-fat>
<saturated-fat>3</saturated-fat>
<cholesterol>5</cholesterol>
<sodium>210</sodium>
<carb>2</carb>
<fiber>0</fiber>
<protein>1</protein>
<vitamins>
<a>0</a>
<c>0</c>
</vitamins>
<minerals>
<ca>0</ca>
<fe>0</fe>
</minerals>
</food>
and then
message = message.substring(message.indexOf("<food>"), message.indexOf("</food>"));
returns
<food>
<name>Avocado Dip</name>
<mfr>Sunnydale</mfr>
<serving units="g">29</serving>
<calories total="110" fat="100"/>
<total-fat>11</total-fat>
<saturated-fat>3</saturated-fat>
<cholesterol>5</cholesterol>
<sodium>210</sodium>
<carb>2</carb>
<fiber>0</fiber>
<protein>1</protein>
<vitamins>
<a>0</a>
<c>0</c>
</vitamins>
<minerals>
<ca>0</ca>
<fe>0</fe>
</minerals>
How do I get it to keep the last </food> tag given I don't know the surrounding content of the XML file?

Here's a solution using javax.xml. It aims to solve the case when multiple <food> elements are present in the document. In order to handle this case correctly, you need to
deserialize your XML into org.w3c.dom.Document
extract the list of <food> nodes as org.w3c.dom.NodeList
serialize back to String at the end
Here's a simplified example:
private static final String XML =
"<?xml version = \"1.0\" encoding = \"UTF-8\"?>\n"
+ "<message>\n"
+ " <food>\n"
+ " <name>A</name>\n"
+ " </food>\n"
+ " <food>\n"
+ " <name>B</name>\n"
+ " </food>\n"
+ "</message>\n";
#Test
public void xpath() throws Exception {
// Deserialize
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document document;
try (InputStream in = new ByteArrayInputStream(XML.getBytes(StandardCharsets.UTF_8))) {
document = factory.newDocumentBuilder().parse(in);
}
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile("//food");
NodeList nodeList = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
System.out.println(node.getNodeName() + ": " + node.getTextContent().trim());
}
// Serialize
Document exportDoc = factory.newDocumentBuilder().newDocument();
Node exportNode = exportDoc.importNode(nodeList.item(0), true);
exportDoc.appendChild(exportNode);
String content = serialize(exportDoc);
System.out.println(content);
}
private static String serialize(Document doc) throws TransformerException {
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
// set indent
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(domSource, result);
return writer.toString();
}
The 1st output shows all <food> elements are deserialized correctly:
food: A
food: B
The 2nd output shows the 1st element are serialized back to string:
<food>
<name>A</name>
</food>

Retain escape character [" < etc..] while copying XML node - Java

I am creating target XML by copying source XML content. I am doing copy at node level.
Source XML has content with escape character which gets converted [$quot; to " etc...] while I create my target XML
Is there any way to retain original XML content.
Appreciate any help on this.
copyXmlFile("Workflow", "./Source.xml", "./Destination.xml");
private static void copyXmlFile(String xmlType, String objectSourceFile, String outfile) throws TransformerException {
//Get the DOM Builder Factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
//Get the DOM Builder
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
//document contains the complete XML as a Tree.
try {
File xmlFileContent = new File(objectSourceFile);
Document document = builder.parse(new FileInputStream(xmlFileContent));
// root elements
Document documentOut = builder.newDocument();
Element rootElementOut = documentOut.createElement(xmlType);
rootElementOut.setAttribute("xmlns", "http://soap.sforce.com/2006/04/metadata");
documentOut.appendChild(rootElementOut);
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
//Node copiedNode = documentOut.importNode(node, true);
//rootElementOut.appendChild(copiedNode);
rootElementOut.appendChild(documentOut.adoptNode(node.cloneNode(true)));
}
}
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(documentOut);
//StreamResult result = new StreamResult(new File(outfile));
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
//transformer.setOutputProperty(OutputKeys.METHOD, "xml");
//transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(source, result);
System.out.println("Escaped XML String in Java: " + writer.toString());
} catch (SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} }

updated xml data not updated in the xml file

i have made a method for updating my xml in the xml file by a using a GUI..
but when I update it everything seem to be working fine and the console is printing out the correct things.
But when I open the xml file and press refrah nothing is updated.
What is my problem?
public void updateObjType(String newTxt, int x) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
System.out.println("String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse("xmlFiles/CoreDatamodel.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
// Go thru the Object_types in the XML file and get item x.
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
String value = nodeList.item(x).getTextContent();
System.out.println(value);
}
this is the output from the console :
Original data : IF150Data
Incoming String value : Data
Index value : 4
updated data : Data

I solved it by using a transformer.
Full solution :
// Update the object type name from the object type list.
public void updateObjType(String newTxt, int x)
throws ParserConfigurationException, SAXException, IOException,
XPathExpressionException {
File file = new File("xmlFiles/CoreDatamodel.xml");
System.out.println("Incoming String value : " + newTxt);
System.out.println("Index value : " + x);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile("//OBJECT_TYPE/text()")
.evaluate(xmlDocument, XPathConstants.NODESET);
// Set new NodeValue
nodeList.item(x).setNodeValue(newTxt);
// Save the new updates
try {
save(file, xmlDocument);
} catch (Exception e) {
e.printStackTrace();
}
}
And then the method I added :
public void save(File file, Document doc) throws Exception {
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String s = writer.toString();
System.out.println(s);
FileWriter fileWriter = new FileWriter(file);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write(s);
bufferedWriter.flush();
bufferedWriter.close();
}

How to retrieve XML including tags using the DOM parser

I am using org.w3c.dom to parse an XML file. Then I need to return the ENTIRE XML for a specific node including the tags, not just the values of the tags. I'm using the NodeList because I need to count how many records are in the file. But I also need to read the file wholesale from the beginning and then write it out to a new XML file. But my current code only prints the value of the node, but not the node itself. I'm stumped.
public static void main(String[] args) {
try {
File fXmlFile = new File (args[0]);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList listOfRecords = doc.getElementsByTagName("record");
int totalRecords = listOfRecords.getLength();
System.out.println("Total number of records : " + totalRecords);
int amountToSplice = queryUser();
for (int i = 0; i < amountToSplice; i++) {
String stringNode = listOfRecords.item(i).getTextContent();
System.out.println(stringNode);
}
} catch (Exception e) {
e.printStackTrace();
}
}

getTextContent() will only "return the text content of this node and its descendants" i.e. you only get the content of the 'text' type nodes. When parsing XML it's good to remember there are several different types of node, see XML DOM Node Types.
To do what you want, you could create a utility method like this...
public static String nodeToString(Node node)
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
t.transform(new DOMSource(node), new StreamResult(sw));
return sw.toString();
}
Then loop and print like this...
for (int i = 0; i < amountToSplice; i++)
System.out.println(nodeToString(listOfRecords.item(i)));

delete the unwanted strings before and after the string in xml file

xml file content
<distributionChannels><distributionChannel type="Wap" id="1"><contentChannelRefs>
<contentChannelRef id="2"><categories><category
link="http://images/11.gif" id="1"><names><name lang="de">Top Downloads</name><name
lang="ww">Tops</name></names></category></categories></contentChannelRef>
</contentChannelRefs></distributionChannel>
</distributionChannels>
how do i delete the unwanted content which i am reading from an xml file and the output should look as shown below:
<category link="http://images/11.gif" id="1"><names><name lang="de">Top Downloads</name><name lang="ww">Tops</name></names></category>

Reliable solution - use an XML parser. Simple solution is
s = s.substring(s.indexOf("<categories>"), s.indexOf("</categories>") + 13);
if you want to read categories one by one use regex
Matcher m = Pattern.compile("<category.*?>.*?</category>").matcher(xml);
for(int i = 0; m.find(); i++) {
System.out.println(m.group());
}

Pattern matching with XML is not recommended. Use a parser to get your nodes and the manage them accordingly. If you are interested in printing them I have included code to print the nodes.
public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(s)));
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr
= xpath.compile("//categories//category");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
//This is where you are printing things. You can handle differently if
//you would like.
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodeToString(nodes.item(i)));
}
}
private static String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
te.printStackTrace();
}
return sw.toString();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pretty print XML in java 8 - java

Create xml file : new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect ! so that, when parse the content of the given input source as an XML document and return a new DOM object. Document original = null; ... original.parse("data.xml");//input source as an XML document

Related

Include delimiter when performing substring operation

Retain escape character [" < etc..] while copying XML node - Java

updated xml data not updated in the xml file

How to retrieve XML including tags using the DOM parser

delete the unwanted strings before and after the string in xml file

Categories

Resources