Java - correctly indenting an XML made from multiple sources - java

I'm trying to correctly indent (indentation = 2) an XML file written by a Java Spring Boot application. The problem is that I'm not making up the XML myself, I'm creating the XML by joining parts of various source XML with different schemas.
My code is:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
DOMSource source = new DOMSource(root.getOwnerDocument());
StreamResult file = new StreamResult(outputFile);
transformer.transform(source, file);
This seems to ignore completely the "indentation" parameter: it just copies whatever indentation was present in the original XML Files.
To copy the nodes, I tried both:
root.appendChild(document.adoptNode(extractedNodeToCopy.cloneNode(true)));
and
root.appendChild(document.importNode(extractedNodeToCopy, true));
But this doesn't change anything.
I don't get error messages, the result is simply indented as the original documents were (so every tag has a different style).

You are using XSLT already. The code
transformerFactory.newTransformer()
instantiates an XSLT transformer with a default XSL template that performs the 'identity' transformation (see also https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/transform/TransformerFactory.html#newTransformer() )
So in your case to have indentation modified you could either
use java to prepare the data in your DOM (stored in root). You would have
to add TextNodes that contain whitespace to your taste, or
do a similar job within a stylesheet (see also Problems Trying to Pretty Print XSLT Output) and make use of that by calling
transformerFactory.newTransformer(your stylesheet)

Related

Multiple xslt tranformations

In input i have xml file(it can be 1000 or 100000 files) and i have to convert it to 6 csv files for later saving to the database. My question is how to do this in java more efficient, now i create 6 transformers with different xslt stylesheets and manually transform xml 6 times. I tried do this in one xslt transformation with function: result-document, it works, but in inputmay be more than one xml file and after each transformation data in result files rewrites. My idea collect all data from xml files in csv and then copy it to db tables.
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTemplates(stylesource).newTransformer();
Transformer transformer2 = tf.newTemplates(stylesource2).newTransformer();
Transformer transformer3 = tf.newTemplates(stylesource3).newTransformer();
Transformer transformer4 = tf.newTemplates(stylesource4).newTransformer();
Transformer transformer5 = tf.newTemplates(stylesource5).newTransformer();
Transformer transformer6 = tf.newTemplates(stylesource6).newTransformer();
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
public void transformXmlToCsv(String content) throws TransformerException, IOException, SAXException {
Document doc = db.parse(new InputSource(new StringReader(content)));
Source source = new DOMSource(doc);
transformer.transform(source, outputTarget);
transformer2.transform(source, outputTarget2);
transformer3.transform(source, outputTarget3);
transformer4.transform(source, outputTarget4);
transformer5.transform(source, outputTarget5);
transformer6.transform(source, outputTarget6);
}
One improvement you could make would be to avoid repeated parsing of the source document by building the input tree once. For example, by building a DOM tree and using a DOMSource, or (better if you're using Saxon) by using Saxon interfaces to build the tree once in Saxon's internal format.
Another improvement would be to only create one TransformerFactory for everything. Creating a TransformerFactory is typically expensive (it involves a search of the classpath) and there's no need to ever create more than one.
It should be easy to fix your problem with xsl:result-document. There are many ways of doing it, e.g. by directing the output of each transformation to a different directory, but I can't tell what the best way is without more information.

Since moving to java 1.7 Xml Document Element does not indent

I'm trying to indent XML which generated by Transformer.
All the DOM Node are being Indent as expected except for the First Node - The Document Element.
document element does not start in a new line , just concat right after the XML Declaration.
This bug arise when I moved to java 1.7 , when using java 1.6 or 1.5 it does not happen.
My code :
ByteArrayOutputStream s = new OutputStreamWriter(out, "utf-8");
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","4");
transformer.transform(new DOMSource(doc), new StreamResult(s));
The output:
<?xml version="1.0" encoding="UTF-8"?><a>
<b>bbbbb</b>
</a>
Anyone knows why ?
btw,
when I add the property
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
It work as expected , but the xml declaration is changed,
it now have the standalone property as well, and i don't want to change the xml declaration..
Ok ,
I found in Java API this :
If the doctype-system property is specified, the xml output method should output a document type declaration immediately before the first element.
SO I used this property
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");
and it solve my problem with out changed my xml declartion.
Thanks.
Xalan has at some point changed the behavior regarding the newline character after the XML declaration.
OpenJDK (and thus Oracle JDK, too) has implemented a workaround for this problem. This workaround can be enabled by setting a special property on the Transformer object:
try {
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
} catch (IllegalArgumentException e) {
// Might be thrown by JDK versions not implementing the workaround.
}
This way, the old behavior (printing a newline character after the XML declaration) is restored without adding the standalone attribute to the XML declaration.
For me writing the XML declaration to the Writer or OutputStream before writing the XML and telling the transformer to omit declaration was the only thing that worked. The only other solution to preserve the spacing appears to be VTD-XML library.
StringBuilder sb = new StringBuilder();
sb.append("<?xml version=\"").append(doc.getXmlVersion()).append("\"");
sb.append(" encoding=\"").append(doc.getXmlEncoding()).append("\"");
if (doc.getXmlStandalone()) {
sb.append(" standalone=\"").append("" + doc.getXmlStandalone()).append("\"");
}
sb.append("?>").append("\n");
writer.write(sb.toString());
TransformerFactory tf = TransformerFactory.newInstance();
try {
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
}
catch (Exception e) {
//snipped out for brevity
}
This seems to be a problem (bug) of the XML implementaion in Java. The only way to get a linebreak after the XML declaration is to explicitly specify the standalone attribute. You may set it to no, which is the implicit default, even if it is completely irrelevant when not using a DTD.

XSL validation while transformation

I'm using the following piece of code to do XSL transformation :
Source source = new StreamSource(new StringReader(request.toString()));
Source xsl = new StreamSource(XSLPath);
StringWriter destination = new StringWriter();
Result result = new StreamResult(destination);
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer;
transformer = transFactory.newTransformer(xsl);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
The XSLPath variable passes the file location to the .xsl file.
I need to know whether the transFactory.newTransformer(xsl) does any kind of internal validation first of the xsl file itself. If not, then is there a way we can do validation of the xsl file, before performing the transformation?
I have a code to validate an xsd file, but, I believe the same code wouldn't work for an xsl. I still tried that as well, but it always throw some or the other SAXException about Non-white spaces not being allowed on most of the lines.
Yes, the first thing the XSLT processor does is to validate and compile the stylesheet. (Why did you have to ask? Just introduce an error, and see what happens!)
You might find it useful to set an ErrorListener to make sure that your application can capture the error messages.
If you are using the same stylesheet repeatedly for many transformations, it is much more efficient to use newTemplates() to create a Templates object so you only do the validation/compilation once. Think of the Templates object as the compiled stylesheet.

How to write to an existing XML file using java

I've got an xml file looked like this. employees.xml
<Employees>
<Employee>
<FirstName>myFirstName</FirstName>
<LastName>myLastName</LastName>
<Salary>10000</Salary>
<Employee>
</Employees>
Now how do I add new Employee elements to the existing XML file?.. An example code is highly appreciated.
You can't 'write nodes to an existing XML file.' You can read an existing XML file into memory, add to the data model, and then write a new file. You can rename the old file and write the new file under the old name. But there is no commonly-used Java utility that will modify an XML file in place.
To add to an existing XML file, you generally need to read it in to an internal data structure, add the needed data in the internal form and then write it all out again, overwriting the original file.
The internal structure can be DOM or one of your own making, and there are multiple ways of both reading it in and writing it out.
If the data is reasonably small, DOM is probably easiest, and there is some sample code in the answers to this related question.
If your data is large, DOM will not do. Possible approaches are to use SAX to read and write (though SAX is traditionally only a reading mechanism) as described in an answer to another related question.
You might also want to consider JAXB or (maybe even best) StAX.
Please use xstream to parse your file as an object, or create a list with employees and then you can directly convert that to xml.
I think this link can be useful for you.
Here you have samples how to read / parse, modify (add elements) and save (write to xml file again).
The following samples you can find at: http://www.petefreitag.com/item/445.cfm
Read:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("/path/to/file.xml");
Modify:
// attributes
Node earth = doc.getFirstChild();
NamedNodeMap earthAttributes = earth.getAttributes();
Attr galaxy = doc.createAttribute("galaxy");
galaxy.setValue("milky way");
earthAttributes.setNamedItem(galaxy);
// nodes
Node canada = doc.createElement("country");
canada.setTextContent("ca");
earth.appendChild(canada);
Write XML file:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//initialize StreamResult with File object to save to file
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);
You need to use DOM to write/edit your xml file.
It's very easy:
You just need to create nodes and add attributes to it.
You can also write/edit XSLT files by using DOM.
just search google for DOM java

Control order of XML attributes in outputed file in Java

How do I control the order that the XML attributes are listed within the output file?
It seems by default they are getting alphabetized, which the program I'm sending this XML to apparently isn't handling.
e.g. I want zzzz to show first, then bbbbb in the following code.
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("requests");
doc.appendChild(root);
root.appendChild(request);
root.setAttribute("zzzzzz", "My z value");
root.setAttribute("bbbbbbb", "My b value");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);
The order of attributes is defined to be insignificant in XML: no conformant XML application should produce results that depend on the order in which attributes appear. Therefore, serializers (code that produces lexical XML as output) will usually give you no control over the order.
Now, it would sometimes be nice to have that control for cosmetic reasons, because XML is designed to be human-readable. So there's a valid reason for wanting the feature. But the fact is, I know of no serializer that offers it.
I had the same issue when I used XML DOM API for writing file. To resolve the problem I had to use XMLStreamWriter. Attributes appear in a xml file in the order you write it using XMLStreamWriter.
XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.
If you don't want to use another framework just for a custom attribute order you can simply add an order identifier to the attributes.
<someElement a__price="32" b__amount="3"/>
After the XML serializer is done post process the raw XML like so:
public static String removeAttributeOrderIdentifiers(String xml) {
return xml.replaceAll(
" [a-z]__(.+?=\")",
"$1"
);
}
And you will get:
<someElement amount="3" price="32"/>

Categories

Resources