StAX XML formatting in Java - java

Is it possible using StAX (specifically woodstox) to format the output xml with newlines and tabs, i.e. in the form:
<element1>
<element2>
someData
</element2>
</element1>
instead of:
<element1><element2>someData</element2></element1>
If this is not possible in woodstox, is there any other lightweight libs that can do this?

There is com.sun.xml.txw2.output.IndentingXMLStreamWriter
XMLOutputFactory xmlof = XMLOutputFactory.newInstance();
XMLStreamWriter writer = new IndentingXMLStreamWriter(xmlof.createXMLStreamWriter(out));

Using the JDK Transformer:
public String transform(String xml) throws XMLStreamException, TransformerException
{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
Writer out = new StringWriter();
t.transform(new StreamSource(new StringReader(xml)), new StreamResult(out));
return out.toString();
}

Via the JDK: transformer.setOutputProperty(OutputKeys.INDENT, "yes");.

If you're using the StAX cursor API, you can indent the output by wrapping the XMLStreamWriter in an indenting proxy. I tried this in my own project and it worked nicely.

Rather than relying on a com.sun...class that might go away (or get renamed com.oracle...class), I recommend downloading the StAX utility classes from java.net. This package contains a IndentingXMLStreamWriter class that works nicely. (Source and javadoc are included in the download.)

How about StaxMate:
http://www.cowtowncoder.com/blog/archives/2006/09/entry_21.html
Works well with Woodstox, fast, low-memory usage (no in-memory tree built), and indents like so:
SMOutputFactory sf = new SMOutputFactory(XMLOutputFactory.newInstance());
SMOutputDocument doc = sf.createOutputDocument(new FileOutputStream("output.xml"));
doc.setIndentation("\n ", 1, 2); // for unix linefeed, 2 spaces per level
// write doc like:
SMOutputElement root = doc.addElement("element1");
root.addElement("element2").addCharacters("someData");
doc.closeRoot(); // important, flushes, closes output

If you're using the iterating method (XMLEventReader), can't you just attach a new line '\n' character to the relevant XMLEvents when writing to your XML file?

Not sure about stax, but there was a recent discussion about pretty printing xml here
pretty print xml from java
this was my attempt at a solution
How to pretty print XML from Java?
using the org.dom4j.io.OutputFormat.createPrettyPrint() method

if you are using XMLEventWriter, then an easier way to do that is:
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLEventWriter writer = outputFactory.createXMLEventWriter(w);
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
Characters newLine = eventFactory.createCharacters("\n");
writer.add(startRoot);
writer.add(newLine);

With Spring Batch this requires a subclass since this JIRA BATCH-1867
public class IndentingStaxEventItemWriter<T> extends StaxEventItemWriter<T> {
#Setter
#Getter
private boolean indenting = true;
#Override
protected XMLEventWriter createXmlEventWriter( XMLOutputFactory outputFactory, Writer writer) throws XMLStreamException {
if ( isIndenting() ) {
return new IndentingXMLEventWriter( super.createXmlEventWriter( outputFactory, writer ) );
}
else {
return super.createXmlEventWriter( outputFactory, writer );
}
}
}
But this requires an additionnal dependency because Spring Batch does not include the code to indent the StAX output:
<dependency>
<groupId>net.java.dev.stax-utils</groupId>
<artifactId>stax-utils</artifactId>
<version>20070216</version>
</dependency>

Related

How to append XML (String) to XmlEventWriter (StAX) which already created the start of document

In order to create big XML files, we decided to make use of the StAX API. The basic structure is build by using the low-level api's: createStartDocument(), createStartElement(). This works as expected.
However, in some cases we like to append existing XML data which resides in a String (retrieved from database). The following snippet illustrates this:
import java.lang.*;
import java.io.*;
import javax.xml.stream.*;
public class Example {
public static void main(String... args) throws XMLStreamException {
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
StringWriter writer = new StringWriter();
XMLEventWriter eventWriter = outputFactory.createXMLEventWriter(writer);
eventWriter.add(eventFactory.createStartDocument("UTF-8", "1.0"));
eventWriter.add(eventFactory.createStartElement("ns0", "http://example.org", "root"));
eventWriter.add(eventFactory.createNamespace("ns0", "http://example.org"));
//In here, we want to append a piece of XML which is stored in a string variable.
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
eventWriter.add(inputFactory.createXMLEventReader(new StringReader(xml)));
eventWriter.add(eventFactory.createEndDocument());
System.out.println(writer.toString());
}
}
With the above code, depending on the implementation, we are not getting the expected result:
Woodstox: The following exception is thrown:'Can not output XML declaration, after output has already been done'. It seems that the XMLEventReader starts off with a startDocument event, but since a startDocument event was already triggered programatically, it throws the error.
JDK: It appends <?xml version="1.0" ... <fragments><fragment>... -> Which leads to invalid XML.
I have also tried to append the XML by using:
eventFactory.createCharacters(xml);
The problem here is that even though the XML is appended, the < and > are transformed into &lt and &gt. Therefore, this results in invalid XML.
Am I missing an API that allows me to simply append a String as XML?
You can first consume any StartDocument if necessary:
String xml = "<fragments><fragment><data>This is pure text.</data></fragment></fragments>";
XMLEventReader xer = inputFactory.createXMLEventReader(new StringReader(xml));
if (xer.peek().isStartDocument())
{
xer.nextEvent();
}
eventWriter.add(xer);

Marshalling CDATA elements with CDATA_SECTION_ELEMENTS adds carriage return characters

I'm working on an application that exports and imports data from / to a DB. The format of the data extract is XML and I'm using JAXB for the serialization / (un)marshalling. I want some elements to be marshalled as CDATA elements and am using this solution which sets OutputKeys.CDATA_SECTION_ELEMENTS to the Transformer properties.
So far this was working quite well, but now I came to a field in the DB that itself contains an XML string, which I also would like to place inside of a CDATA element. Now, for some reason the Transformer is now adding some unnecessary carriage return characters (\r) to each line end, so that the output looks like this:
This is my code:
private static final String IDENT_LENGTH = "3";
private static final String CDATA_XML_ELEMENTS = "text definition note expression mandatoryExpression optionalExpression settingsXml";
public static void marshall(final Object rootObject, final Schema schema, final Writer writer) throws Exception {
final JAXBContext ctx = JAXBContext.newInstance(rootObject.getClass());
final Document document = createDocument();
final Marshaller marshaller = ctx.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setSchema(schema);
marshaller.marshal(rootObject, document);
createTransformer().transform(new DOMSource(document), new StreamResult(writer));
}
private static Document createDocument() throws ParserConfigurationException {
// the DocumentBuilderFactory is actually being hold in a singleton
final DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
return builderFactory.newDocumentBuilder().newDocument();
}
private static Transformer createTransformer() throws TransformerConfigurationException, TransformerFactoryConfigurationError {
// the TransformerFactory is actually being hold in a singleton
final TransformerFactory transformerFactory = TransformerFactory.newInstance();
final Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, CDATA_XML_ELEMENTS);
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", IDENT_LENGTH);
return transformer;
}
I'm passing a FileWriter to the marshall method.
My annotated model class looks like this:
#XmlType
#XmlRootElement
public class DashboardSettings {
#XmlElement
private String settingsXml;
public String getSettingsXml() {
return settingsXml;
}
public void setSettingsXml(final String settingsXml) {
this.settingsXml = settingsXml;
}
}
NOTE:
The XML string coming from the DB has Windows style line endings, i.e. \r and \n. I have the feeling that JAXB expects currently Linux style input (i. e. only \n) and is therefore adding a \r character because I'm currently running on a Windows machine. So the question is actually, what's the best way to solve this? Is there any parameter I can pass to control the line ending characters when marshalling? Or should I convert the line endings to Linux style prior marshalling? How will my program behave on different platforms (Windows, Linux, Mac OS)?
I don't necessarily need a platform independent solution, it's OK if the output is in Windows, Linux or whatever style. What I want to avoid is the combination \r\r\n as shown in the above screenshot.
I realise this question is pretty old, but I ran into a similar problem, so maybe an answer can help someone else.
It seems to be an issue with CDATA sections. In my case, I was using the createCDATASection method to create them. When the code was running on a Windows machine, an additional CR was added, as in your example.
I've tried a bunch of things to solve this "cleanly", to no avail.
In my project, the XML document was then exported to a string to POST to a Linux server. So once the string was generated, I just removed the CR characters, leaving only the LF:
myXmlString.replaceAll("\\r", "");
I might not be an appropriate solution for the specific question, but once again, it may help you (or someone else) find a solution.
Note: I'm stuck with Java 7 for this specific project, so it may have been fixed in a more recent version.

Creating XML from XPATH

I am a new bee with xml and XSL, work with legacy platforms...
I am looking for a solution to create XML from XPATH. Happen to see this post How to Generate an XML File from a set of XPath Expressions?
which helped me a lot.
Similar to the request discussed under "Comments" section, I am trying to pass whole XSLT as string, and receiving result as a sting back using Saxon. Receiving result as string, no issues. But when passing XSL as string, it complain about "document()" which is part of < xsl:variable name="vStylesheet" select="document('')" />. Error is "SXXP0003: Error reported by XML parser: Content is not allowed in prolog."
My basic requirement is I should be able to pass XSL (whole file or "vPop" portion) as a string and should receive result in another string without involving any files. That way I can improve the performance and make it generic so that anyone in our shop who does not know how to deal with XML can still generate one...
My java code looks like..
public static String simpleTransform(final String xsltStr) {
String strResult = "";
TransformerFactory tFactory = TransformerFactory.newInstance();
String tempXMLStr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<OpnXMLTAG>Dummy</OpnXMLTAG>";
try {
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
StreamSource XSLSource = new StreamSource(new StringReader(xsltStr));
Transformer transformer = tFactory.newTransformer(XSLSource );
transformer.transform(new StreamSource(new StringReader(tempXMLStr)), result);
strResult = writer.toString();
} catch (Exception e) {
e.printStackTrace();
}
return strResult;
}
And XSLT string I am passing is the same from earlier post.
When you call document(''), it treats '' as a relative URI reference to be resolved against the base URI of the stylesheet. This won't work if the base URI of the stylesheet is unknown, which is the case when you supply it as a StreamSource wrapping a StringReader with no systemId.
In the case of Saxon, document('') actually needs to re-read the stylesheet. It doesn't keep the source file around at run-time just in case it's needed. So you'll need to (a) supply a URI as the systemId property on the StreamSource (any URI will do, it won't actually be read), and (b) supply a URIResolver to resolve the call on document('') and supply the original string.

Alternative to IndentingXMLStreamWriter.java

Is their an alternative to IndentingXMLStreamWriter.java i've always had some sort of issue at some point in which i am unable to compile though it goes away after a while. So I was wondering if their was an alternate way to indent manually parsed XML files
though the error message is slightly differnt when it is compiled as part of a netbeans module... the paths are alterered with ~ for anyone thats wondering =p
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\MasterDeckXMLImporterExporter.java:5: package com.sun.xml.internal.txw2.output does not exist
import com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter;
Note: Attempting to workaround 6512707
warning: No processor claimed any of these annotations: [javax.xml.bind.annotation.XmlValue, javax.xml.bind.annotation.XmlSeeAlso, javax.xml.bind.annotation.XmlAccessorType, javax.xml.bind.annotation.XmlRootElement, javax.xml.bind.annotation.XmlAttribute]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\MasterDeckXMLImporterExporter.java:5: package com.sun.xml.internal.txw2.output does not exist
import com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\MasterDeckXMLImporterExporter.java:68: cannot find symbol
symbol : class IndentingXMLStreamWriter
location: class com.spectre.util.MasterDeckXMLImporterExporter
xsw = new IndentingXMLStreamWriter(xsw);
2 errors
3 warnings
C:\Program Files\jmonkeyplatform\harness\suite.xml:182: The following error occurred while executing this line:
C:\Program Files\jmonkeyplatform\harness\common.xml:206: Compile failed; see the compiler error output for details.
Just to be a bit clear on things this would be how i would use stax
import com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter;
XMLStreamWriter xsw = XMLOutputFactory.newInstance().createXMLStreamWriter(new FileOutputStream(new File("Blah")));
xsw = new IndentingXMLStreamWriter(xsw);
xsw.writeStartDocument();
xsw.writeStartElement("map");
for (Map.Entry<String, Date> entry : map.entrySet()) {
xsw.writeEmptyElement("entry1");
xsw.writeAttribute("Name", entry.getKey());
xsw.writeAttribute("date", sdf.format(entry.getValue()));
}
xsw.writeEndElement();
xsw.writeEndDocument();
xsw.close();
You could use Saxon. In the s9api interface, you can do something like
Processor p = new Processor();
Serializer s = p.newSerializer(System.out);
s.setOutputProperty(Property.INDENT, "yes");
XMLStreamWriter w = s.getXMLStreamWriter();
and then you have an indenting serializer that implements the XMLStreamWriter interface, with many more formatting options available if you want to play with them.
If you parse your XML to an instance of org.w3c.Document (e.g. using DocumentBuilderFactory), you could try the following.
Using Apache Xerces:
Document doc = ...;
OutputFormat format = new OutputFormat(doc);
format.setIndenting(true);
format.setIndent(2);
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(doc);
Or using the standard TransformerFactory:
Document doc = ...;
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
t.transform(new DOMSource(doc), new StreamResult(out));

XMLStreamWriter writeCharacters without escaping

How do I use XMLStreamWriter to write exactly what I put in? For instance, if I create script tag and fill it with javascript I don't want all my single quotes coming out as &apos;
Here's a small test I wrote that doesn't use any of the abstractions I've got in place, just calls to writeCharacters.
public void testWriteCharacters() {
StringWriter sw = new StringWriter();
XMLOutputFactory factory = XMLOutputFactory.newInstance();
StringBuffer out = new StringBuffer();
try {
XMLStreamWriter writer = factory.createXMLStreamWriter(sw);
writer.writeStartElement("script");
writer.writeAttribute("type","text/javascript");
writer.writeCharacters("function hw(){ \n"+
"\t alert('hello world');\n" +
"}\n");
writer.writeEndElement();
out.append(sw);
} catch (XMLStreamException e) {
} finally {
try {
sw.close();
} catch(IOException e) {
e.printStackTrace();
}
}
System.out.println(out.toString());
}
This produces an apos entity for both the single quotes surrounding hello world.
You could use a property on the factory:
final XMLOutputFactory streamWriterFactory = XMLOutputFactory.newFactory();
streamWriterFactory.setProperty("escapeCharacters", false);
Then the writer created by this factory will write characters without escaping the text in the element given that the factory supports this property. XMLOutputFactoryImpl does.
XmlStreamWriter.writeCharacters() doesn't escape '. It only escapes <, > and &, and writeAttribute also escapes " (see javadoc).
However, if you want to write text without escaping at all, you have to write it as a CDATA section using writeCData().
The typical approach for writing scripts in CDATA sections is:
<script>//<![CDATA[
...
//]]></script>
That is:
out.writeCharacters("//");
out.writeCData("\n ... \n//");
Alternative method, with custom escape handler:
XMLOutputFactory xmlFactory = XMLOutputFactory.newInstance();
xmlFactory.setProperty(XMLOutputFactory2.P_TEXT_ESCAPER, new MyEscapingWriterFactory());
'MyEscapingWriterFactory' is your implementation of 'EscapingWriterFactory' interface. It allows fine grained text escaping control. This is useful when you use text element to deal with random input (say, invalid XML with multiple processing instructions or incorrectly written CDATA sections).
You can also use woodstox's stax implementation. Their XMLStreamWriter2 class has a writeRaw() method. We're using it for this specific reason and it works great.
Write directly to the underlying Writer or OutputStream:
Writer out = new StringWriter();
XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(out);
... //write your XML
writer.flush();
//write extra characters directly to the underlying writer
out.write("<yourstuff>Test characters</yourstuff>");
out.flush();
... //continue with normal XML
writer.writeEndElement();
writer.flush();

Categories

Resources