Since moving to java 1.7 Xml Document Element does not indent

Since moving to java 1.7 Xml Document Element does not indent - java

I'm trying to indent XML which generated by Transformer.
All the DOM Node are being Indent as expected except for the First Node - The Document Element.
document element does not start in a new line , just concat right after the XML Declaration.
This bug arise when I moved to java 1.7 , when using java 1.6 or 1.5 it does not happen.
My code :
ByteArrayOutputStream s = new OutputStreamWriter(out, "utf-8");
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount","4");
transformer.transform(new DOMSource(doc), new StreamResult(s));
The output:
<?xml version="1.0" encoding="UTF-8"?><a>
<b>bbbbb</b>
</a>
Anyone knows why ?
btw,
when I add the property
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
It work as expected , but the xml declaration is changed,
it now have the standalone property as well, and i don't want to change the xml declaration..

Ok ,
I found in Java API this :
If the doctype-system property is specified, the xml output method should output a document type declaration immediately before the first element.
SO I used this property
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");
and it solve my problem with out changed my xml declartion.
Thanks.

Xalan has at some point changed the behavior regarding the newline character after the XML declaration.
OpenJDK (and thus Oracle JDK, too) has implemented a workaround for this problem. This workaround can be enabled by setting a special property on the Transformer object:
try {
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
} catch (IllegalArgumentException e) {
// Might be thrown by JDK versions not implementing the workaround.
}
This way, the old behavior (printing a newline character after the XML declaration) is restored without adding the standalone attribute to the XML declaration.

For me writing the XML declaration to the Writer or OutputStream before writing the XML and telling the transformer to omit declaration was the only thing that worked. The only other solution to preserve the spacing appears to be VTD-XML library.
StringBuilder sb = new StringBuilder();
sb.append("<?xml version=\"").append(doc.getXmlVersion()).append("\"");
sb.append(" encoding=\"").append(doc.getXmlEncoding()).append("\"");
if (doc.getXmlStandalone()) {
sb.append(" standalone=\"").append("" + doc.getXmlStandalone()).append("\"");
}
sb.append("?>").append("\n");
writer.write(sb.toString());
TransformerFactory tf = TransformerFactory.newInstance();
try {
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
}
catch (Exception e) {
//snipped out for brevity
}

This seems to be a problem (bug) of the XML implementaion in Java. The only way to get a linebreak after the XML declaration is to explicitly specify the standalone attribute. You may set it to no, which is the implicit default, even if it is completely irrelevant when not using a DTD.

Related

Java - correctly indenting an XML made from multiple sources

I'm trying to correctly indent (indentation = 2) an XML file written by a Java Spring Boot application. The problem is that I'm not making up the XML myself, I'm creating the XML by joining parts of various source XML with different schemas.
My code is:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
DOMSource source = new DOMSource(root.getOwnerDocument());
StreamResult file = new StreamResult(outputFile);
transformer.transform(source, file);
This seems to ignore completely the "indentation" parameter: it just copies whatever indentation was present in the original XML Files.
To copy the nodes, I tried both:
root.appendChild(document.adoptNode(extractedNodeToCopy.cloneNode(true)));
and
root.appendChild(document.importNode(extractedNodeToCopy, true));
But this doesn't change anything.
I don't get error messages, the result is simply indented as the original documents were (so every tag has a different style).

You are using XSLT already. The code
transformerFactory.newTransformer()
instantiates an XSLT transformer with a default XSL template that performs the 'identity' transformation (see also https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/transform/TransformerFactory.html#newTransformer() )
So in your case to have indentation modified you could either
use java to prepare the data in your DOM (stored in root). You would have
to add TextNodes that contain whitespace to your taste, or
do a similar job within a stylesheet (see also Problems Trying to Pretty Print XSLT Output) and make use of that by calling
transformerFactory.newTransformer(your stylesheet)

Java transformer, xml:space and indentation difference between Java 8 en Java 11

In our application we're using the below (partially) code to format the XML.
This works fine on Java 8 but no longer on Java 11.
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
In our XML there is a XML element named 'condition' with the attribute xml:space="preserve".
Everything XML elements before this tag is well formatted, all the elements (no children) after this condition tag stay on the same line (no indentation).
We're using the transformer factory 'com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl' when using the 'org.apache.xalan.processor.TransformerFactoryImpl' it works better.
This is probably introduced in Java 9 because that had a lot of changes in the transformer part. I could not find any issue or solution addressing the above part.
Does anyone have a solution for this problem (or is this the new way that preserve should work)?

XSL validation while transformation

I'm using the following piece of code to do XSL transformation :
Source source = new StreamSource(new StringReader(request.toString()));
Source xsl = new StreamSource(XSLPath);
StringWriter destination = new StringWriter();
Result result = new StreamResult(destination);
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer;
transformer = transFactory.newTransformer(xsl);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
The XSLPath variable passes the file location to the .xsl file.
I need to know whether the transFactory.newTransformer(xsl) does any kind of internal validation first of the xsl file itself. If not, then is there a way we can do validation of the xsl file, before performing the transformation?
I have a code to validate an xsd file, but, I believe the same code wouldn't work for an xsl. I still tried that as well, but it always throw some or the other SAXException about Non-white spaces not being allowed on most of the lines.

Yes, the first thing the XSLT processor does is to validate and compile the stylesheet. (Why did you have to ask? Just introduce an error, and see what happens!)
You might find it useful to set an ErrorListener to make sure that your application can capture the error messages.
If you are using the same stylesheet repeatedly for many transformations, it is much more efficient to use newTemplates() to create a Templates object so you only do the validation/compilation once. Think of the Templates object as the compiled stylesheet.

Control order of XML attributes in outputed file in Java

How do I control the order that the XML attributes are listed within the output file?
It seems by default they are getting alphabetized, which the program I'm sending this XML to apparently isn't handling.
e.g. I want zzzz to show first, then bbbbb in the following code.
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("requests");
doc.appendChild(root);
root.appendChild(request);
root.setAttribute("zzzzzz", "My z value");
root.setAttribute("bbbbbbb", "My b value");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);

The order of attributes is defined to be insignificant in XML: no conformant XML application should produce results that depend on the order in which attributes appear. Therefore, serializers (code that produces lexical XML as output) will usually give you no control over the order.
Now, it would sometimes be nice to have that control for cosmetic reasons, because XML is designed to be human-readable. So there's a valid reason for wanting the feature. But the fact is, I know of no serializer that offers it.

I had the same issue when I used XML DOM API for writing file. To resolve the problem I had to use XMLStreamWriter. Attributes appear in a xml file in the order you write it using XMLStreamWriter.

XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.

If you don't want to use another framework just for a custom attribute order you can simply add an order identifier to the attributes.
<someElement a__price="32" b__amount="3"/>
After the XML serializer is done post process the raw XML like so:
public static String removeAttributeOrderIdentifiers(String xml) {
return xml.replaceAll(
" [a-z]__(.+?=\")",
"$1"
);
}
And you will get:
<someElement amount="3" price="32"/>

Xml transformation encoding problem

Hi I have a simple code:
InputSource is = new InputSource(new StringReader(xml))
Document d = documentBuilder.parse(is)
StringWriter result = new StringWriter()
DOMSource ds = new DOMSource(d)
Transformer t = TransformerFactory.newInstance().newTransformer()
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
t.setOutputProperty(OutputKeys.STANDALONE, "yes");
t.setOutputProperty(OutputKeys.ENCODING,"UTF-16")
t.transform(ds,new StreamResult(result))
return result.toString()
that should trasnform an xml to UTF-16 encoding. Although internal representation of String in jvm already uses UTF-16 chars as far I know, but my expectations are that the result String should contain a header where the encoding is set to "UTF-16", originla xml where it was UTF-8 but I get:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
(also the standalone property seems to be wrong)
The transformer instance is: com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
(what I think is a default)
So what I miss here?

Use a writer where you explicitly declare UTF-16 as output encoding. Try OutputStreamWriter(OutputStream out, String charsetName) which should wrap aByteArrayOutputStream and see if this works.

I have wrote a test on my own now. With one minor change:
t.transform(ds,new StreamResult(new File("dest.xml")));
I have the same results but the file is indeed UTF-16 encoded, checked with a hex editor. For some strange reason the xml declaration is not changed. So your code works.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Since moving to java 1.7 Xml Document Element does not indent - java

This seems to be a problem (bug) of the XML implementaion in Java. The only way to get a linebreak after the XML declaration is to explicitly specify the standalone attribute. You may set it to no, which is the implicit default, even if it is completely irrelevant when not using a DTD.

Related

Java - correctly indenting an XML made from multiple sources

Java transformer, xml:space and indentation difference between Java 8 en Java 11

XSL validation while transformation

Control order of XML attributes in outputed file in Java

Xml transformation encoding problem

Categories

Resources