Transforming Streaming XSLT Without a Custom Content Handler - java

Take a look at this website:
http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain
It constantly streams data. You can transform this content using the newest version of XSLT (v3), with a command like this:
<xsl:stream href="http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain">
If I want to write some Java code to initiate the transformation (using Saxon, which has implemented xsl:stream), I can do this:
// XSL
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new FileInputStream(xslFile)));
// XML
StreamSource xmlSource = new StreamSource(new FileInputStream(xmlFile));
// Output
MyCustomContentHandler handler = new MyCustomContentHandler();
PrintStream outputPrintStream = new PrintStream(new BufferedOutputStream(new FileOutputStream(outputFile)), true);
handler.setPrintStream(outputPrintStream);
Result result = new SAXResult(handler);
// Transform
transformer.transform(xmlSource, result);
This works. If you let it run for a bit, then open the output file, you’ll see data in it. If you re-open it a bit later, you’ll see even more data. The key to this is the custom content handler that processes the various SAX events.
But suppose that I don’t really want a custom content handler. Suppose I just want to keep the output of the XSLT as is. I can modify my code like this:
// XSL
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new FileInputStream(xslFile)));
// XML
StreamSource xmlSource = new StreamSource(new FileInputStream(xmlFile));
// Output
TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
transformerHandler.setResult(new StreamResult(new PrintWriter(new FileOutputStream(outputFile, true), true)));
// or this…
//transformerHandler.setResult(new StreamResult(new FileOutputStream(outputFile)));
// or this…
//transformerHandler.setResult(new StreamResult(new FileWriter(outputFile)));
ContentHandler contentHandler = (ContentHandler) transformerHandler;
SAXResult result = new SAXResult(transformerHandler);
// Transform
transformer.transform(xmlSource, result);
The good news is that I no longer need a custom content handler, and my output now matches the output of the XSLT exactly. The bad news is that although this code works with non-streaming XSLT, it does not work with streaming XSLT. Despite my various attempts at setting the result (see the “or this…” statements above), nothing is written to the file. I suspect there’s a buffering problem of some sort.
Question: How can I combine the best of these two together? How can I transform a streaming XSLT without having to use a custom content handler?

This seems to be a rerun of a thread on the saxon-help list in June:
http://sourceforge.net/p/saxon/mailman/message/32472658/
The conclusion there was that the output was somehow being buffered in the output stream pipeline. Saxon is emitting events representing the transformation result, as you see by supplying a ContentHandler, but the serialization of these events is being buffered in the I/O system.

At this time, it does not appear to be possible to do what I want to do. My current solution is to use a custom content handler (per my question above) and run its results through a standard XSLT identity transformation. A bit ugly and not very efficient, but it works.

Related

Java - correctly indenting an XML made from multiple sources

I'm trying to correctly indent (indentation = 2) an XML file written by a Java Spring Boot application. The problem is that I'm not making up the XML myself, I'm creating the XML by joining parts of various source XML with different schemas.
My code is:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
DOMSource source = new DOMSource(root.getOwnerDocument());
StreamResult file = new StreamResult(outputFile);
transformer.transform(source, file);
This seems to ignore completely the "indentation" parameter: it just copies whatever indentation was present in the original XML Files.
To copy the nodes, I tried both:
root.appendChild(document.adoptNode(extractedNodeToCopy.cloneNode(true)));
and
root.appendChild(document.importNode(extractedNodeToCopy, true));
But this doesn't change anything.
I don't get error messages, the result is simply indented as the original documents were (so every tag has a different style).
You are using XSLT already. The code
transformerFactory.newTransformer()
instantiates an XSLT transformer with a default XSL template that performs the 'identity' transformation (see also https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/transform/TransformerFactory.html#newTransformer() )
So in your case to have indentation modified you could either
use java to prepare the data in your DOM (stored in root). You would have
to add TextNodes that contain whitespace to your taste, or
do a similar job within a stylesheet (see also Problems Trying to Pretty Print XSLT Output) and make use of that by calling
transformerFactory.newTransformer(your stylesheet)

How to use Java API to parse xml string with XSLT and generate output in memory only?

I need to parse the internal XML(from a response) with predefined XSLT and send back the parsed result in html to the client. I notice the following example to use and generate local files. How to avoid the file creation with Java API? I want to replace the source.xml with String and generate the html output on the fly.
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer (new javax.xml.transform.stream.StreamSource("searchresult.xslt"));
transformer.transform(new javax.xml.transform.stream.StreamSource("source.xml"),
new javax.xml.transform.stream.StreamResult( new FileOutputStream("result.html")));
StreamSource has a constructor taking a Reader as argument. You can thus pass a StringReader, which will read the XML from a String, as argument.
Similarly, the StreamResult constructor the example uses takes an OutputStream as argument. You can thus pass any kind of OutputStream (like the HTTP response output stream, or a ByteArrayOutputStream, or a socket output stream) to send the result to wherever you like.

XSL validation while transformation

I'm using the following piece of code to do XSL transformation :
Source source = new StreamSource(new StringReader(request.toString()));
Source xsl = new StreamSource(XSLPath);
StringWriter destination = new StringWriter();
Result result = new StreamResult(destination);
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer;
transformer = transFactory.newTransformer(xsl);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
The XSLPath variable passes the file location to the .xsl file.
I need to know whether the transFactory.newTransformer(xsl) does any kind of internal validation first of the xsl file itself. If not, then is there a way we can do validation of the xsl file, before performing the transformation?
I have a code to validate an xsd file, but, I believe the same code wouldn't work for an xsl. I still tried that as well, but it always throw some or the other SAXException about Non-white spaces not being allowed on most of the lines.
Yes, the first thing the XSLT processor does is to validate and compile the stylesheet. (Why did you have to ask? Just introduce an error, and see what happens!)
You might find it useful to set an ErrorListener to make sure that your application can capture the error messages.
If you are using the same stylesheet repeatedly for many transformations, it is much more efficient to use newTemplates() to create a Templates object so you only do the validation/compilation once. Think of the Templates object as the compiled stylesheet.

How to transform XMLStreamReader to XMLStreamWriter

Should be easy and obvious but I cant find a way - the XMLOutputFactory accepts anly OutputStream, Result or another Writer to generate a new XMLStreamWriter. What I have at hand is an XMLStreamReader which has no methods for extracting a Result or an OutputStream.
If the solution would be easier using the Event API, that would be OK too.
Thank you
You could use a javax.xml.transform.Transformer to convert a StAXSource wrapping the reader to a StAXResult wrapping the writer.
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
StAXSource source = new StAXSource(xmlStreamReader);
StAXResult result = new StAXResult(xmlStreamWriter);
t.transform(source, result);
Using the Event API you could also use the folloiwng:
http://download.oracle.com/javase/6/docs/api/javax/xml/stream/XMLEventWriter.html#add(javax.xml.stream.XMLEventReader)

Efficient XSLT pipeline in Java (or redirecting Results to Sources)

I have a series of XSL 2.0 stylesheets that feed into each other, i.e. the output of stylesheet A feeds B feeds C.
What is the most efficient way of doing this? The question rephrased is: how can one efficiently route the output of one transformation into another.
Here's my first attempt:
#Override
public void transform(Source data, Result out) throws TransformerException{
for(Transformer autobot : autobots){
if(autobots.indexOf(autobot) != (autobots.size()-1)){
log.debug("Transforming prelim stylesheet...");
data = transform(autobot,data);
}else{
log.debug("Transforming final stylesheet...");
autobot.transform(data, out);
}
}
}
private Source transform(Transformer autobot, Source data) throws TransformerException{
DOMResult result = new DOMResult();
autobot.transform(data, result);
Node node = result.getNode();
return new DOMSource(node);
}
As you can see, I'm using a DOM to sit in between transformations, and although it is convenient, it's non-optimal performance wise.
Is there any easy way to route to say, route a SAXResult to a SAXSource? A StAX solution would be another option.
I'm aware of projects like XProc, which is very cool if you haven't taken a look at yet, but I didn't want to invest in a whole framework.
I found this: #3. Chaining Transformations that shows two ways to use the TransformerFactory to chain transformations, having the results of one transform feed the next transform and then finally output to system out. This avoids the need for an intermediate serialization to String, file, etc. between transforms.
When multiple, successive
transformations are required to the
same XML document, be sure to avoid
unnecessary parsing operations. I
frequently run into code that
transforms a String to another String,
then transforms that String to yet
another String. Not only is this slow,
but it can consume a significant
amount of memory as well, especially
if the intermediate Strings aren't
allowed to be garbage collected.
Most transformations are based on a
series of SAX events. A SAX parser
will typically parse an InputStream or
another InputSource into SAX events,
which can then be fed to a
Transformer. Rather than having the
Transformer output to a File, String,
or another such Result, a SAXResult
can be used instead. A SAXResult
accepts a ContentHandler, which can
pass these SAX events directly to
another Transformer, etc.
Here is one approach, and the one I
usually prefer as it provides more
flexibility for various input and
output sources. It also makes it
fairly easy to create a transformation
chain dynamically and with a variable
number of transformations.
SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();
// These templates objects could be reused and obtained from elsewhere.
Templates templates1 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet1.xslt")));
TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);
th1.setResult(new SAXResult(th2));
th2.setResult(new StreamResult(System.out));
Transformer t = stf.newTransformer();
t.transform(new StreamSource(System.in), new SAXResult(th1));
// th1 feeds th2, which in turn feeds System.out.
Related question Efficient XSLT pipeline, with params, in Java clarified on correct parameters passing to such transformer chain.
And it also gave a hint on slightly shorter solution without third transformer:
SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();
Templates templates1 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet2.xslt")));
TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);
th2.setResult(new StreamResult(System.out));
// Note that indent, etc should be applied to the last transformer in chain:
th2.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
th1.getTransformer().transform(new StreamSource(System.in), new SAXResult(th2));
Your best bet is to stick to DOM as you're doing, because an XSLT processor would have to build a tree anyway - streaming is only an option for very limited category of transforms, and few if any processors can figure it out automatically and switch to a streaming-only implementation; otherwise they just read the input and build the tree.

Categories

Resources