XSL validation while transformation

XSL validation while transformation - java

I'm using the following piece of code to do XSL transformation :
Source source = new StreamSource(new StringReader(request.toString()));
Source xsl = new StreamSource(XSLPath);
StringWriter destination = new StringWriter();
Result result = new StreamResult(destination);
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer;
transformer = transFactory.newTransformer(xsl);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
The XSLPath variable passes the file location to the .xsl file.
I need to know whether the transFactory.newTransformer(xsl) does any kind of internal validation first of the xsl file itself. If not, then is there a way we can do validation of the xsl file, before performing the transformation?
I have a code to validate an xsd file, but, I believe the same code wouldn't work for an xsl. I still tried that as well, but it always throw some or the other SAXException about Non-white spaces not being allowed on most of the lines.

Yes, the first thing the XSLT processor does is to validate and compile the stylesheet. (Why did you have to ask? Just introduce an error, and see what happens!)
You might find it useful to set an ErrorListener to make sure that your application can capture the error messages.
If you are using the same stylesheet repeatedly for many transformations, it is much more efficient to use newTemplates() to create a Templates object so you only do the validation/compilation once. Think of the Templates object as the compiled stylesheet.

Related

Java - correctly indenting an XML made from multiple sources

I'm trying to correctly indent (indentation = 2) an XML file written by a Java Spring Boot application. The problem is that I'm not making up the XML myself, I'm creating the XML by joining parts of various source XML with different schemas.
My code is:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
DOMSource source = new DOMSource(root.getOwnerDocument());
StreamResult file = new StreamResult(outputFile);
transformer.transform(source, file);
This seems to ignore completely the "indentation" parameter: it just copies whatever indentation was present in the original XML Files.
To copy the nodes, I tried both:
root.appendChild(document.adoptNode(extractedNodeToCopy.cloneNode(true)));
and
root.appendChild(document.importNode(extractedNodeToCopy, true));
But this doesn't change anything.
I don't get error messages, the result is simply indented as the original documents were (so every tag has a different style).

You are using XSLT already. The code
transformerFactory.newTransformer()
instantiates an XSLT transformer with a default XSL template that performs the 'identity' transformation (see also https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/transform/TransformerFactory.html#newTransformer() )
So in your case to have indentation modified you could either
use java to prepare the data in your DOM (stored in root). You would have
to add TextNodes that contain whitespace to your taste, or
do a similar job within a stylesheet (see also Problems Trying to Pretty Print XSLT Output) and make use of that by calling
transformerFactory.newTransformer(your stylesheet)

Multiple xslt tranformations

In input i have xml file(it can be 1000 or 100000 files) and i have to convert it to 6 csv files for later saving to the database. My question is how to do this in java more efficient, now i create 6 transformers with different xslt stylesheets and manually transform xml 6 times. I tried do this in one xslt transformation with function: result-document, it works, but in inputmay be more than one xml file and after each transformation data in result files rewrites. My idea collect all data from xml files in csv and then copy it to db tables.
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTemplates(stylesource).newTransformer();
Transformer transformer2 = tf.newTemplates(stylesource2).newTransformer();
Transformer transformer3 = tf.newTemplates(stylesource3).newTransformer();
Transformer transformer4 = tf.newTemplates(stylesource4).newTransformer();
Transformer transformer5 = tf.newTemplates(stylesource5).newTransformer();
Transformer transformer6 = tf.newTemplates(stylesource6).newTransformer();
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
public void transformXmlToCsv(String content) throws TransformerException, IOException, SAXException {
Document doc = db.parse(new InputSource(new StringReader(content)));
Source source = new DOMSource(doc);
transformer.transform(source, outputTarget);
transformer2.transform(source, outputTarget2);
transformer3.transform(source, outputTarget3);
transformer4.transform(source, outputTarget4);
transformer5.transform(source, outputTarget5);
transformer6.transform(source, outputTarget6);
}

One improvement you could make would be to avoid repeated parsing of the source document by building the input tree once. For example, by building a DOM tree and using a DOMSource, or (better if you're using Saxon) by using Saxon interfaces to build the tree once in Saxon's internal format.
Another improvement would be to only create one TransformerFactory for everything. Creating a TransformerFactory is typically expensive (it involves a search of the classpath) and there's no need to ever create more than one.
It should be easy to fix your problem with xsl:result-document. There are many ways of doing it, e.g. by directing the output of each transformation to a different directory, but I can't tell what the best way is without more information.

Errors creating html file using xml and xsl file (xslt 2.0) (Recoverable error I/O error reported by XML parser processing)

Im using the code below to generate html from a xml and xsl files. It was working fine for xslt 1.0, but now Im using xslt 2.0. And Im getting this error when execute the code below: ERROR: Unsupported XSL element 'for-each-group'.'
Do you know how to fix this?
private static void createHtml(){
try {
TransformerFactory tf =TransformerFactory.newInstance();
Source xslFile =new StreamSource("test.xsl");
Source xmlFile =new StreamSource(new StringReader(xml));
String resultFile ="test.html";
OutputStream htmlFile=new FileOutputStream(resultFile );
Transformer trasform=tFactory.newTransformer(xslFile);
trasform.transform(xmlFile, new StreamResult(htmlFile));
}
catch (Exception e)
{
System.out.println("Error.");
}
}
Im trying to do like this now:
TransformerFactory tfactory = TransformerFactory.newInstance();
Transformer transformer = tfactory.newTransformer(new StreamSource(new File("test.xsl")));
transformer.transform(new StreamSource(new File(xml)),
new StreamResult("test.html"));
And in the main method:
System.setProperty("javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl")
And I get this error:
Recoverable error
I/O error reported by XML parser processing
Any idea how to fix this?

You're reporting two different errors:
(a) ERROR: Unsupported XSL element 'for-each-group'.'
(b) Recoverable error: I/O error reported by XML parser processing
It seems unlikely that you got these from the same run, so it's not clear what you were doing on each of these occasions to get two different errors.
The first error means that for some reason you have loaded an XSLT processor that doesn't support XSLT 2.0. The safest and fastest way to be sure of loading Saxon is to instantiate it directly:
TransformerFactory tfactory = new net.sf.saxon.TransformerFactoryImpl();
If you rely on the JAXP mechanisms there are all sorts of things that can go wrong, and it can also be very slow because it involves searching the classpath.
With the second error, you haven't given enough information to provide a full diagnosis. However, the fact that there's no filename in the message gives a clue: it means Saxon doesn't know the filename, and this is often because you have done something like:
Source xmlFile =new StreamSource(new StringReader(xml));
A StringReader itself will never give an I/O error. But if the XML that you are parsing contains a DTD reference, or references to other external entities, and if these are relative references, then the XML parser won't be able to resolve them because it doesn't have a base URI to work with. You can supply a base URI in the second argument of the StreamSource constructor.

Transforming Streaming XSLT Without a Custom Content Handler

Take a look at this website:
http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain
It constantly streams data. You can transform this content using the newest version of XSLT (v3), with a command like this:
<xsl:stream href="http://xmpp.wordpress.com:8008/firehose.xml?type=text/plain">
If I want to write some Java code to initiate the transformation (using Saxon, which has implemented xsl:stream), I can do this:
// XSL
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new FileInputStream(xslFile)));
// XML
StreamSource xmlSource = new StreamSource(new FileInputStream(xmlFile));
// Output
MyCustomContentHandler handler = new MyCustomContentHandler();
PrintStream outputPrintStream = new PrintStream(new BufferedOutputStream(new FileOutputStream(outputFile)), true);
handler.setPrintStream(outputPrintStream);
Result result = new SAXResult(handler);
// Transform
transformer.transform(xmlSource, result);
This works. If you let it run for a bit, then open the output file, you’ll see data in it. If you re-open it a bit later, you’ll see even more data. The key to this is the custom content handler that processes the various SAX events.
But suppose that I don’t really want a custom content handler. Suppose I just want to keep the output of the XSLT as is. I can modify my code like this:
// XSL
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new FileInputStream(xslFile)));
// XML
StreamSource xmlSource = new StreamSource(new FileInputStream(xmlFile));
// Output
TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();
transformerHandler.setResult(new StreamResult(new PrintWriter(new FileOutputStream(outputFile, true), true)));
// or this…
//transformerHandler.setResult(new StreamResult(new FileOutputStream(outputFile)));
// or this…
//transformerHandler.setResult(new StreamResult(new FileWriter(outputFile)));
ContentHandler contentHandler = (ContentHandler) transformerHandler;
SAXResult result = new SAXResult(transformerHandler);
// Transform
transformer.transform(xmlSource, result);
The good news is that I no longer need a custom content handler, and my output now matches the output of the XSLT exactly. The bad news is that although this code works with non-streaming XSLT, it does not work with streaming XSLT. Despite my various attempts at setting the result (see the “or this…” statements above), nothing is written to the file. I suspect there’s a buffering problem of some sort.
Question: How can I combine the best of these two together? How can I transform a streaming XSLT without having to use a custom content handler?

This seems to be a rerun of a thread on the saxon-help list in June:
http://sourceforge.net/p/saxon/mailman/message/32472658/
The conclusion there was that the output was somehow being buffered in the output stream pipeline. Saxon is emitting events representing the transformation result, as you see by supplying a ContentHandler, but the serialization of these events is being buffered in the I/O system.

At this time, it does not appear to be possible to do what I want to do. My current solution is to use a custom content handler (per my question above) and run its results through a standard XSLT identity transformation. A bit ugly and not very efficient, but it works.

Control order of XML attributes in outputed file in Java

How do I control the order that the XML attributes are listed within the output file?
It seems by default they are getting alphabetized, which the program I'm sending this XML to apparently isn't handling.
e.g. I want zzzz to show first, then bbbbb in the following code.
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("requests");
doc.appendChild(root);
root.appendChild(request);
root.setAttribute("zzzzzz", "My z value");
root.setAttribute("bbbbbbb", "My b value");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);

The order of attributes is defined to be insignificant in XML: no conformant XML application should produce results that depend on the order in which attributes appear. Therefore, serializers (code that produces lexical XML as output) will usually give you no control over the order.
Now, it would sometimes be nice to have that control for cosmetic reasons, because XML is designed to be human-readable. So there's a valid reason for wanting the feature. But the fact is, I know of no serializer that offers it.

I had the same issue when I used XML DOM API for writing file. To resolve the problem I had to use XMLStreamWriter. Attributes appear in a xml file in the order you write it using XMLStreamWriter.

XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.

If you don't want to use another framework just for a custom attribute order you can simply add an order identifier to the attributes.
<someElement a__price="32" b__amount="3"/>
After the XML serializer is done post process the raw XML like so:
public static String removeAttributeOrderIdentifiers(String xml) {
return xml.replaceAll(
" [a-z]__(.+?=\")",
"$1"
);
}
And you will get:
<someElement amount="3" price="32"/>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XSL validation while transformation - java

Related

Java - correctly indenting an XML made from multiple sources

Multiple xslt tranformations

Errors creating html file using xml and xsl file (xslt 2.0) (Recoverable error I/O error reported by XML parser processing)

Transforming Streaming XSLT Without a Custom Content Handler

Control order of XML attributes in outputed file in Java

Categories

Resources