How to provide a utility on XSLT while maintaining security

How to provide a utility on XSLT while maintaining security - java

I would like the ability to provide an escape utility that can be used in an XSL Stylesheet. For example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xalan="http://xml.apache.org/xalan"
xmlns:escape="xalan://com.example.myservice.MyEscapeTool">
However, in terms of Java, my understanding is that lack of the following setting on your TransformerFactory can be insecure:
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
So I did that, but understandably this blocks your ability to use "external function calls" with the following runtime error:
FATAL: XPath syntax error at char 12 in {escape:new()}:
Cannot find a matching 0-argument function named
{java:com.example.myservice.MyEscapeTool}new(). Note: external
function calls have been disabled;
Removing the aforementioned FEATURE_SECURE_PROCESSING flag will fix the issue.
How can I include a utility function that can be called in XSLT, without causing a loss in security with the ability to expose ANY arbitrary Java class?

As #MartinHonnen points out in his comment, if you switch to using Saxon, then you can restrict the stylesheet to use only "integrated extension functions" which are registered with the XSLT processor prior to execution, without allowing the stylesheet to call any class/method that happens to be on the classpath.

Related

Saxon delivering a V1 Xpath object

I have been using SAXON HE 9.5.1-5 for a while, successfully.
We are doing a general upgrade of some versions of components in our platform, and included in this was moving to Saxon 9.8.0-8
The code fails using that version.
The following is in our Spring beans file:
<bean id="xpathFactory" class="net.sf.saxon.xpath.XPathFactoryImpl" factory-method="newInstance"/>
<bean id="xpath" factory-bean="xpathFactory" factory-method="newXPath"/>
<bean id="myRequestValidator" class="gov.dhs.ice.prime.query.RequestValidator">
<constructor-arg index="0" ref="xpath"/>
As you can see, the last bean is passed the result of the "newXPath" method.
For debug, I get the name of the passed in object.
When using 9.5.1-5, the incoming object is a net.sf.saxon.xpath.XPathEvaluator object
Now with 9.8.0-8, I am receiving an org.apache.xpath.jaxp.XPathImpl object
The RequestValidator program then does some XPath compiles as part of the constructor. With the XPathEvaluator object, everything as fine (as it has been all along)
Now that I am getting an org.apache.xpath.jaxp.XPathImpl object, the constructor fails when the program attempts to compile a V2 XPath statement. V1 works fine.
So, why is this newer version returning a different object than before?
I did try just constructing the net.sf.saxon.xpath.XPathEvaluator directly..
<bean id="xpath" class="net.sf.saxon.xpath.XPathEvaluator"/>
<bean id="myRequestValidator" class="gov.dhs.ice.prime.query.RequestValidator">
<constructor-arg index="0" ref="xpath"/>
And that worked with the new version. But, it seems like the docs recommend the earlier approach.
Any ideas what is going on here?
Thanks

Saxon for some releases does no longer register itself as a JAXP XPathFactory, see http://saxonica.com/html/documentation9.8/xpath-api/jaxp-xpath/factory.html:
The JAXP API is designed on the basis that when your application
invokes XPathFactory.newInstance(), an XPath engine is selected by
examining the values of system properties and searching the classpath.
If you rely on this mechanism, then your application may end up
running with an XPath engine on which it has never been tested. Since
different XPath engines can differ in many significant respects (most
notably, the version of XPath that they support), this can easily lead
to application failures. Saxon therefore no longer identifies itself
(in the JAR file manifest) as a JAXP XPath supplier. If you want to
load Saxon as your XPath engine, you need to select it explicitly;
it's not enough to just put it on the classpath
And the newInstance method you think you are calling on net.sf.saxon.xpath.XPathFactoryImpl is in reality the method of the abstract base class https://docs.oracle.com/javase/8/docs/api/javax/xml/xpath/XPathFactory.html#newInstance-- and that is calling the other overload https://docs.oracle.com/javase/8/docs/api/javax/xml/xpath/XPathFactory.html#newInstance-java.lang.String- of the base class that is supposed to load the JAXP registered XPathFactory.
So what you ought to do is: "If you want to use Saxon as your XPath implementation, you must instantiate the class net.sf.saxon.xpath.XPathFactoryImpl directly.", e.g. in Java code new net.sf.saxon.xpath.XPathFactoryImpl().
I am not sure how that is expressed in that bean syntax declarative way but perhaps you know that or someone else can help with that.

Is it possible to cache XML documents in Saxon to avoid re-parsing and re-indexing?

I am currently assessing whether XSLT3 with Saxon could be useful for our purposes. Please hear me out.
We are developing a REST API which provides credentials given an input request XML. Basically, there are 3 files in play:
site.xml:
This file holds the data representing the complete organisation: users, roles, credentials, settings, ...
It could easily contain 10.000 lines.
It could be considered as static/immutable.
You could compare it as XML representation of a database, so to say.
request.xml:
This file holds the request as provided to the REST API.
It is rather small, usually around 10 to 50 lines.
It is different for each request.
request.xslt:
This file holds the stylesheet to convert the given request.xml to an output XML.
It loads site.xml via the XSLT document() function, as it needs that data to fulfill the request.
The problem here is that loading site.xml in request.xslt takes a long time. In addition, for each request, indexes as introduced by the XSLT <xsl:key .../> directive must be rebuilt. This adds up.
So it would make sense to somehow cache site.xml, to avoid having to parse and index that file for every request.
It's important to note that multiple API requests can happen concurrently, thus it should be safe to share this cached site.xml between several ongoing XSLT transformations.
Is this possible with Saxon (Java)? How would that work?
Update 1
After some additional reflecting, I realize that maybe I should not attempt to just cache the site.xml XML file, but the request.xslt instead? This assumes that site.xml, which is loaded in request.xslt via document(), is part of that cache.

It would help if you show/tell us which API you use to run XSLT with Saxon.
As for caching the XSLT, with JAXP I think you can do that with a Templates created with newTemplates from the TransformerFactoryImpl (http://saxonica.com/html/documentation/using-xsl/embedding/jaxp-transformation.html), each time you want to run the XSLT you will to create a Transformer with newTransformer().
With the s9api API you can compile once to get an XsltExecutable (http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XsltExecutable.html) that "is immutable, and therefore thread-safe", you then have to us load() or load30() to create an XsltTransformer or Xslt30Transformer each time you need to run the code.
As for sharing a document, see http://saxonica.com/html/documentation/sourcedocs/preloading.html:
An option is available (Feature.PRE_EVALUATE_DOC_FUNCTION) to indicate
that calls to the doc() or document() functions with constant string
arguments should be evaluated when a query or stylesheet is compiled,
rather than at run-time. This option is intended for use when a
reference or lookup document is used by all queries and
transformations
The section on that configuration option, however, states:
In XSLT 3.0 a better way of having external documents pre-loaded at
stylesheet compile time is to use the new facility of static global
variables.
So in that case you could declare
<xsl:variable name="site-doc" static="yes" select="doc('site.xml')"/>
You will need to wait on Michael Kay's response as to whether that suffices to share the document.

Well, it is certainly possible, but the best way of doing it depends a little on the circumstances, e.g. what happens when site.xml changes.
I would be inclined to create a single s9api Processor at application startup, and immediately (that is, during application initialization) load site.xml into an XdmNode using Processor.DocumentBuilder.build(); this can then be passed as a parameter value (an <xsl:param>) into each transformation that uses it. Or if you prefer to access it using document(), you could register a URIResolver that responds to the document() call by returning the relevant XdmNode.
As for indexing and the key() function, so long as the xsl:key definition is "sharable", then if two transformations based on the same compiled stylesheet (s9api XsltExecutable) access the same document, the index will not be rebuilt. An xsl:key definition is shareable if its match and use attributes do not depend on anything that can vary from one transformation to another, such as the content of global variables or parameters.
Saxon's native tree implementations (unlike the DOM) are thread-safe: if you build a document once, you can access it in multiple threads. The building of indexes to support the key() function is synchronized so concurrent transformations will not interfere with each other.
Martin's suggestion of allowing compile-time evaluation of the document() call would also work. You could also put the document into a global variable defined with static="yes". This doesn't play well, however, with exporting compiled stylesheets into persistent files: there are some restrictions that apply when exporting a stylesheet that contains node-valued static variables.

How can I get PMD result report as java object from code?

I'm writing something like PMD wrapper (the goal is to check the java code with PMD, but with certain features), pmd-core and pmd-java included to my project like external libraries and I'm executing PMD such way:
int violations = PMD.doPMD(configuration);
doPMD returns number of violations found. By configuring reportFormat in PMDConfiguration we can set output to System.out or to file with one of the available report formats (like xml, html, text, etc.), but...
How can I get PMD result (of all source files were processed) report as java object? Perhaps, it is possible to get a list<> of all RuleViolations or something else.

You can use a custom Renderer.
Create a custom implementation of net.sourceforge.pmd.renderers.Renderer
On your configuration call setReportFormat passing the fully quallified name of your custom renderer (ie: "my.organization.package.InMemoryRenderer")
run the analysis, just as you are doing
During the analysis process, your renderer will be instantiated exactly once, and renderFileReport(Report) method will be called for each file's results. The Report has method's to obtain and iterate over violations, config and execution errors.
These are provided as POJOs, as you intend.
You won't have access to the Renderer instance, but you can store data in a static member (make sure to keep thread-safety as PMD runs multithreaded analysis!) and provide a static getter.

How to allow for different XSD versions to be validated properly?

I am attempting to update some xml parsers, and have hit a small snag. We have an xsd that we need to keep compatible with older versions of the xml, and we had to make some changes to it. We made the changes in a new version of the xsd, and we would like to use the same parser (as the changes are pretty small in general, and the parser can easily handle both). We are using the XMLReader property "http://java.sun.com/xml/jaxp/properties/schemaSource" to set the schema to the previous edition, using something like the following:
xmlReader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource",
new InputSource(getClass().getResourceAsStream("/schema/my-xsd-1.0.xsd")));
This worked fine when we only had one version of the schema. Now we have a new version, and we want the system to use whichever version of the schema is defined in the incoming xml. Both schemas define a namespace, something like the following:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.mycompany.com/my-xsd-1.0"
xmlns="http://www.mycompany.com/my-xsd-1.0"
elementFormDefault="unqualified" attributeFormDefault="unqualified">
and, for the new one:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.mycompany.com/my-xsd-1.1"
xmlns="http://www.mycompany.com/my-xsd-1.1"
elementFormDefault="unqualified" attributeFormDefault="unqualified">
So, they have different namespaces and different schema "locations" defined. We don't want the schema to live on the 'net - we want it to be bundled with our system. Is there a way to use the setProperty mechanism to do this behavior, or is there a different way to handle this?
I tried putting both resources in an input stream in an array as the parameter, but that didn't work (I remember reading somewhere that this was a possible solution - although now I can't find the source, so it might have been wishful thinking).

So, it turns out what I had tried actually worked - we were accidentally using invalid xml! What works (for anyone else who is interested) is the following:
List<InputSource> inputs = new ArrayList<InputSource>();
inputs.add(new InputSource(getClass().getResourceAsStream("/schema/my-xsd-1.0.xsd")));
inputs.add(new InputSource(getClass().getResourceAsStream("/schema/my-xsd-1.1.xsd")));
xmlReader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource",
inputs.toArray(new InputSource[inputs.size()]));

Personally I think it's generally a bad idea to change the namespace when you version a schema, unless the changes are radical - but views differ on that, and you seem to have made your decision, and you may as well reap the benefits.
Since you're using two different namespaces, the schemas are presumably disjoint, so you should be able to give the processor a schema that's the union of the two - I don't know if there's a better way, but one way of achieving this is to write a little stub schema that imports both, and supply this stub as your schemaSource property. The processor will use whichever schema declarations match the namespace of the elements in the source document.
(Using version-specific namespaces makes this task - validation - easier. But it makes subsequent processing of the XML, e.g. using XPath, harder, because it's hard to write code that works with both namespaces.)

Java+XSL, calling Java code from within template

I'm working with XSL templates in Java, and I'm trying to build a custom tag that will call some Java code, then put a result inside the template. I'm using XOM as my XML engine. I'm kind of new with both XOM and XSL, so I'm not even sure if this is a smart idea.
A very simple example of something I want to do is this, where my_ns is a custom namespace with 'custom_tag' that the method custom tag
<xsl:template name="foo">
<my_ns:custom_tag />
</xsl:template>
public Node custom_tag() {
return Node("<a/>");
}
#result of calling the template foo
<a/>
I'm open to suggestions for involve alternate ways of calling Java from a XSL template.

This is more a question about if your XSLT processor can execute/call java code from within the template more than your XML engine/parser/api. The default XSLT processor for java is Xalan-C or Xalan-J (can't remember which) from the Apache Software Foundation. I do believe both of them allow for extension functions to execute java code inside the method. I've done JDBC sql queries inside a XSL stylesheet before using a xalan-j extension function. I also recall reading that the Saxon XSLT processor also allows this functionality. You'll have search your XSLT processor to get the specifics on to implement this.
The question on whether this is a good idea or not really depends on the problem. Even though I used the SQL extension function mentioned above and it fit the bill in that case, I felt really dirty about it afterwards. The reason I say this is because you lose portability between XSLT processors when you add in the vendor-specific extension functions.
Your example shows you are just simply creating a new node in the output and if that is the case, I don't see the need to have java do this when that is one of the main functions of XSLT: creating nodes. I suspect your real problem is more complex than simply creating a node so I'll suggest you may want to look into doing all the work in java to get the results you are looking for OR doing some of the work in java and passing a parameter (name/value pair using the xsl:param element) to your XSL stylesheet a runtime.
Here's some quick sites to get you started:
http://xml.apache.org/xalan-j/extensions.html
http://www.saxonica.com/documentation/extensions/intro.xml
http://www.w3schools.com/xsl/
http://www.w3schools.com/xsl/el_param.asp

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.