Saxon/Javax Transform from multiple XML Files/Strings - java

This code is in Java and uses Saxon
I am implementing a transform function to transform xml and several secondary xml sources
All of the inputs are not files, so I cannot use document() or other methods that define files directly
String transform(String xml, List<String> secondaryXmls, String xslt);
It outputs the transformed xml result
I am successful in applying the transformation from xslt to the single xml file, but I have difficulties in applying transformation that also utilize the secondaryXmls. I have done my research and still could not find the right method to apply these
here is a snapshot of the code
TransformerFactory tFactory = TransformerFactory.newInstance("net.sf.saxon.TransformerFactoryImpl",null);
Document transformerDoc = loadXMLFromString(xslt);
Source transformerSource = new DOMSource(transformerDoc);
Transformer transformer = tFactory.newTransformer(transformerSource);
Document sourceDoc = loadXMLFromString(xml);
Source source = new DOMSource(sourceDoc);
DOMResult result = new DOMResult();
transformer.transform(source, result);
Document resultDoc = (Document) result.getNode();
return getStringFrom(resultDoc);
Thanks!
EDIT:
Which is the better way:
concatenating all the xmls, transform, return only the original part filtering the concatenated secondary xmls
Write a code that adds
<xsl:variable name="asd" select="document('asd')">
on top of the xslt string

First thing - get rid of all that DOM stuff! Using the DOM with Saxon slows it down by a factor of ten. Let Saxon build the trees in its own format, by using a StreamSource or SAXSource, and a StreamResult. Or you can build a tree in Saxon format yourself, if you want, using the s9api DocumentBuilder class.
Then as to the answer to your question: here are three possible solutions:
(a) supply the documents as a stylesheet parameter of type document-node()* (that is, a sequence of document nodes). In the Java, convert your list of XML strings to a list of document nodes by calling Configuration.buildDocument() on each one.
(b) write a URIResolver whose effect is to interpret the URI doc/3 as meaning the third document in the list; then use document('doc/3') to fetch that document.
(c) write a CollectionURIResolver which makes the whole collection of documents available using the collection() function.

Related

Transform multiple input XML documents with XSLT in a Java application using the Saxon9HE API

How can I transform multiple XML input document objects with a single XSL transformation script using the Saxon9HE processor in a Java application?
I found a way to transform multiple XML input files from the filesystem with an XSLT script here, but I can't figure out how to pass multiple loaded XML Document objects to a Java application utilizing the Saxon9HE API. For a single XML document my code looks like this and works:
Processor proc = new Processor(false);
XsltCompiler comp = proc.newXsltCompiler();
try {
XsltExecutable exp = comp.compile(new StreamSource(stylesheetFile));
XdmNode source = proc.newDocumentBuilder().build(new DOMSource(inputXML));
Serializer out = proc.newSerializer();
out.setOutputProperty(Serializer.Property.METHOD, "xml");
out.setOutputProperty(Serializer.Property.INDENT, "yes");
out.setOutputFile(new File(outputFilename));
XsltTransformer trans = exp.load();
trans.setInitialContextNode(source);
trans.setDestination(out);
trans.transform();
} catch (SaxonApiException e) {
e.printStackTrace();
}
First point: avoid DOM if you can. When you are using Saxon, it's best to let Saxon build the document tree; this will be far more efficient. If you really need to use an external tree model, XOM and JDOM2 are much more efficient than DOM.
If you do want to provide a DOM as input, you have two choices: you can copy it to a Saxon tree, or you can wrap it as a Saxon tree. Use DocumentBuilder.build() in the first case, DocumentBuilder.wrap() in the second. Using build() gives you a higher initial cost, but the transformation itself is then faster.
If you want to pass pre-built trees into the transformation, declare the parameter using <xsl:param name="x" as="document-node()"/>, and then invoke the transformation using transformer.setParameter(new QName('x'), doc) where doc is an instance of XdmNode. You have to construct the XdmNode yourself by using a DocumentBuilder.
(Alternatively, if you want to access the documents in the stylesheet using the doc() or document() functions, you can invent a URI naming scheme and implement this in a URIResolver. When doc('my:uri') is called, your URIResolver is notified, and it should respond with a Source object. If you already have an XdmNode handy, then you can return XdmNode.asSource() to return this document tree as the result of your URIResolver.)

Get org.w3c.dom.Document from XMLResourceParser

I'm planning on putting some XML files in my res/xml directory, which I want to load into Document objects (as I'm using these currently in my application and am passing them around a lot).
Doing:
XMLResourceParser parser = getResources().getXml(R.xml.my_xml_file)
seems to be the only starting point and the XMLResourceParser implements XmlPullParser, which seems to be aimed at iterating through the tree of nodes.
I've tried using xpath to pull out the root node of the document, as per the code in this answer: navigate through XML document with XMLpullParser, but this gives me an exception as per this question: Evaluating XPath an XmlResourceParser causes Exception
Ideally, I need to keep the xml in the res/xml folder, as I'll want to localise the content, so moving it to res/raw or res/assets isn't an option.
Before I implement something which iterates through the XmlResourceParser and builds the Document, is there a simpler way of doing this?
Writing the code to iterate through the XMLResourceParser shows that you are a nice well-mannered developer.
However, you may be tempted to use the evil genius option.
First, create an XmlPullParser implementation that takes an XmlResourceParser in the constructor, wraps it and delegates all its methods to it -- except all the setInput-type methods. You want all those to be no-op stubs, because XmlResourceParser will throw exceptions (see XmlBlock.java, line 108 or so). Android studio can automate creating the delegate methods so you don't have to hand-code all that.
PullParserWrapper wrapper = new PullParserWrapper(
getResources().getXml(R.xml.my_xml_file));
Note: Your wrapper class might have to have some method implementations to handle turning off namespace processing and other assorted things.
Next, you will use a org.xmlpull.v1.sax2.Driver to wrap your parser and convert it into an XMLReader:
XMLReader xmlReader = new Driver(wrapper);
Now set up a dummy input source (the XmlResourceParser already knows where it's getting its input):
InputSource inputSource = new InputSource(new StringReader(""));
Then you use a Transformer to convert your SAX input to a DOM output and get your result:
Transformer transformer = TransformerFactory.newInstance().newTransformer();
DOMResult domResult = new DOMResult();
transformer.transform(new SAXSource(xmlReader, inputSource), domResult);
Document document = (Document) domResult.getNode();
MWAH Ha Ha Ha Ha Ha Ha!

Resolving which version of an XML Schema to use for XML documents with a version attribute

I have to write some code to handle reading and validating XML documents that use a version attribute in their root element to declare a version number, like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Junk xmlns="urn:com:initech:tps"
xmlns:xsi="http://www3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:com:initech.tps:schemas/foo/Junk.xsd"
VersionAttribute="2.0">
There are a bunch of nested schemas, my code has an org.w3c.dom.ls.LsResourceResolver to figure out what schema to use, implementing this method:
LSInput resolveResource(String type,
String namespaceURI,
String publicId,
String systemId,
String baseURI)
Previous versions of the schema have embedded the schema version into the namespace, so I could use the namespaceURI and systemId to decide which schema to provide. Now the version number has been switched to an attribute in the root element, and my resolver doesn't have access to that. How am I supposed to figure out the version of the XML document in the LsResourceResolver?
I had never had to deal with schema versions before this and had no idea what was involved. When the version was part of the namespace then I could throw all the schemas in together and let them get sorted out, but with the version in the root element and namespace shared across versions there is no getting around reading the version information from the XML before starting the SAX parsing.
I'm going to do something very similar to what Pangea suggested (gets +1 from me), but I can't follow the advice exactly because the document is too big to read it all into memory, even once. By using STAX I can minimize the amount of work done to get the version from the file. See this DeveloperWorks article, "Screen XML documents efficiently with StAX":
The screening or classification of XML documents is a common problem,
especially in XML middleware. Routing XML documents to specific
processors may require analysis of both the document type and the
document content. The problem here is obtaining the required
information from the document with the least possible overhead.
Traditional parsers such as DOM or SAX are not well suited to this
task. DOM, for example, parses the whole document and constructs a
complete document tree in memory before it returns control to the
client. Even DOM parsers that employ deferred node expansion, and thus
are able to parse a document partially, have high resource demands
because the document tree must be at least partially constructed in
memory. This is simply not acceptable for screening purposes.
The code to get the version information will look like:
def map = [:]
def startElementCount = 0
def inputStream = new File(inputFile).newInputStream()
try {
XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(inputStream)
for (int event; (event = reader.next()) != XMLStreamConstants.END_DOCUMENT;) {
if (event == XMLStreamConstants.START_ELEMENT) {
if (startElementCount > 0) return map
startElementCount += 1
map.rootElementName = reader.localName
for (int i = 0; i < reader.attributeCount; i++) {
if (reader.getAttributeName(i).toString() == 'VersionAttribute') {
map.versionIdentifier = reader.getAttributeValue(i).toString()
return map
}
}
}
}
} finally {
inputStream.close()
}
Then I can use the version information to figure out what resolver to use and what schema documents to set on the SaxFactory.
My Suggestion
Parse the Document using SAX or DOM
Get the version attribute
Use the Validator.validate(Source) method and and use the already parsed Document (from step 1) as shown below
Building DOMSource from parsed document
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(args[0]));
domSource = new DOMSource(document);

XSLT transform in xmlSignature java?

I have a XML document.I am signing a part of document using xmlsignature. Before finding digest, I want to apply XSLT transform. According to what I read, XSLT converts an XML document to another format(can be XML also). Now I am confused that, where will be the transformed new document is avilable?How to retrieve the value from this newly created document if I want to show it to user?
My XML Document
<r1>
<user>asd</user>
<person>ghi</person>
</r1>
Code for Transformation
Transform t=fac.newTransform(Transform.XPATH,new XPathFilterParameterSpec("/r1/user"));
According to xpath transformation,Whenever value of user element changes the xmlsignature should not be validated. And if person element's value changes then Signature should be validated. But when I change person element's value the signature is not validated. WHY?
The xslt transform used when signing a document relates to how nodes in your source XML are selected when the signature is calculated.
This question/answer by Dave relates to signing parts of an XML document using xpath2. The link to Sean Mullans' post in this answer suggests xpath2 is more appropriate for signing parts of a document because the evaluation of an xpath expression is done per node.
So based on the sun dsig example you can replace the Reference creation using:
List<XPathType> xpaths = new ArrayList<XPathType>();
xpaths.add(new XPathType("//r1/user", XPathType.Filter.INTERSECT));
Reference ref = fac.newReference
("", fac.newDigestMethod(DigestMethod.SHA1, null),
Collections.singletonList
(fac.newTransform(Transform.XPATH2,
new XPathFilter2ParameterSpec(xpaths))),
null, null);
This allows //r1/user to be protected with a signature while the rest of the document can be altered.
The problem with the xpath/xpath2 selection is that a signature can be generated for /some/node/that/does/not/exist. You are right to modify a test document and make sure the signature is working the way you expect.
You might test the document in a test program by generating a signature then tampering with the xml node before verification:
NodeList nlt = doc.getElementsByTagName("user");
nlt.item(0).getFirstChild().setTextContent("Something else");
A more reliable alternative to an xpath selector might be to put an ID on the xml document elements you hope to sign like:
<r1>
<user id="sign1">asd</user>
<person>ghi</person>
</r1>
then reference this ID as the URI in the first parameter of an enveloped transfer:
Reference ref = fac.newReference
("#sign1", fac.newDigestMethod(DigestMethod.SHA1, null),
Collections.singletonList
(fac.newTransform(Transform.ENVELOPED,(TransformParameterSpec) null)),
null, null);
For the output, a signature operation adds a new Signature element to the DOM you have loaded in memory. You can stream the output by transforming it like this:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
trans.setOutputProperty(OutputKeys.INDENT, "yes");
trans.transform(new DOMSource(doc), new StreamResult(System.out));
The XSLT specification doesn't define what happens to the result document; that's defined by the API specifications of your chosen XSLT processor. For example, if you invoke XSLT from Java using the JAXP interface, you can ask for the result as a DOM tree in memory or for it to be serialized to a specified file on disk.
You have tagged your question "Java" which is the only clue you give to your processing environment. My guess is you want to transform to a DOM and then use DOM interfaces to get the value from the new document. Though if you're using XSLT 2.0 and Saxon, the s9api interface is much more usable than the native JAXP interface.
The xslt part defines only the transformation definition, nothing else.
Have a look at this:
java xslt tutorial
in Francois Gravel answer the input.xml file is the file that will be transformed, the transform.xslt is the xslt definition which describes how to transform the xml file. output.out are the results, this may be xml, but it can also be html, flat file...
This is where I started with when i was using xslt:
http://www.w3schools.com/xsl/default.asp
Have a look at this also:
http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog

Efficient XSLT pipeline in Java (or redirecting Results to Sources)

I have a series of XSL 2.0 stylesheets that feed into each other, i.e. the output of stylesheet A feeds B feeds C.
What is the most efficient way of doing this? The question rephrased is: how can one efficiently route the output of one transformation into another.
Here's my first attempt:
#Override
public void transform(Source data, Result out) throws TransformerException{
for(Transformer autobot : autobots){
if(autobots.indexOf(autobot) != (autobots.size()-1)){
log.debug("Transforming prelim stylesheet...");
data = transform(autobot,data);
}else{
log.debug("Transforming final stylesheet...");
autobot.transform(data, out);
}
}
}
private Source transform(Transformer autobot, Source data) throws TransformerException{
DOMResult result = new DOMResult();
autobot.transform(data, result);
Node node = result.getNode();
return new DOMSource(node);
}
As you can see, I'm using a DOM to sit in between transformations, and although it is convenient, it's non-optimal performance wise.
Is there any easy way to route to say, route a SAXResult to a SAXSource? A StAX solution would be another option.
I'm aware of projects like XProc, which is very cool if you haven't taken a look at yet, but I didn't want to invest in a whole framework.
I found this: #3. Chaining Transformations that shows two ways to use the TransformerFactory to chain transformations, having the results of one transform feed the next transform and then finally output to system out. This avoids the need for an intermediate serialization to String, file, etc. between transforms.
When multiple, successive
transformations are required to the
same XML document, be sure to avoid
unnecessary parsing operations. I
frequently run into code that
transforms a String to another String,
then transforms that String to yet
another String. Not only is this slow,
but it can consume a significant
amount of memory as well, especially
if the intermediate Strings aren't
allowed to be garbage collected.
Most transformations are based on a
series of SAX events. A SAX parser
will typically parse an InputStream or
another InputSource into SAX events,
which can then be fed to a
Transformer. Rather than having the
Transformer output to a File, String,
or another such Result, a SAXResult
can be used instead. A SAXResult
accepts a ContentHandler, which can
pass these SAX events directly to
another Transformer, etc.
Here is one approach, and the one I
usually prefer as it provides more
flexibility for various input and
output sources. It also makes it
fairly easy to create a transformation
chain dynamically and with a variable
number of transformations.
SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();
// These templates objects could be reused and obtained from elsewhere.
Templates templates1 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet1.xslt")));
TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);
th1.setResult(new SAXResult(th2));
th2.setResult(new StreamResult(System.out));
Transformer t = stf.newTransformer();
t.transform(new StreamSource(System.in), new SAXResult(th1));
// th1 feeds th2, which in turn feeds System.out.
Related question Efficient XSLT pipeline, with params, in Java clarified on correct parameters passing to such transformer chain.
And it also gave a hint on slightly shorter solution without third transformer:
SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();
Templates templates1 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
getClass().getResourceAsStream("MyStylesheet2.xslt")));
TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);
th2.setResult(new StreamResult(System.out));
// Note that indent, etc should be applied to the last transformer in chain:
th2.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
th1.getTransformer().transform(new StreamSource(System.in), new SAXResult(th2));
Your best bet is to stick to DOM as you're doing, because an XSLT processor would have to build a tree anyway - streaming is only an option for very limited category of transforms, and few if any processors can figure it out automatically and switch to a streaming-only implementation; otherwise they just read the input and build the tree.

Categories

Resources