I'm working with XSL templates in Java, and I'm trying to build a custom tag that will call some Java code, then put a result inside the template. I'm using XOM as my XML engine. I'm kind of new with both XOM and XSL, so I'm not even sure if this is a smart idea.
A very simple example of something I want to do is this, where my_ns is a custom namespace with 'custom_tag' that the method custom tag
<xsl:template name="foo">
<my_ns:custom_tag />
</xsl:template>
public Node custom_tag() {
return Node("<a/>");
}
#result of calling the template foo
<a/>
I'm open to suggestions for involve alternate ways of calling Java from a XSL template.
This is more a question about if your XSLT processor can execute/call java code from within the template more than your XML engine/parser/api. The default XSLT processor for java is Xalan-C or Xalan-J (can't remember which) from the Apache Software Foundation. I do believe both of them allow for extension functions to execute java code inside the method. I've done JDBC sql queries inside a XSL stylesheet before using a xalan-j extension function. I also recall reading that the Saxon XSLT processor also allows this functionality. You'll have search your XSLT processor to get the specifics on to implement this.
The question on whether this is a good idea or not really depends on the problem. Even though I used the SQL extension function mentioned above and it fit the bill in that case, I felt really dirty about it afterwards. The reason I say this is because you lose portability between XSLT processors when you add in the vendor-specific extension functions.
Your example shows you are just simply creating a new node in the output and if that is the case, I don't see the need to have java do this when that is one of the main functions of XSLT: creating nodes. I suspect your real problem is more complex than simply creating a node so I'll suggest you may want to look into doing all the work in java to get the results you are looking for OR doing some of the work in java and passing a parameter (name/value pair using the xsl:param element) to your XSL stylesheet a runtime.
Here's some quick sites to get you started:
http://xml.apache.org/xalan-j/extensions.html
http://www.saxonica.com/documentation/extensions/intro.xml
http://www.w3schools.com/xsl/
http://www.w3schools.com/xsl/el_param.asp
Related
I am currently assessing whether XSLT3 with Saxon could be useful for our purposes. Please hear me out.
We are developing a REST API which provides credentials given an input request XML. Basically, there are 3 files in play:
site.xml:
This file holds the data representing the complete organisation: users, roles, credentials, settings, ...
It could easily contain 10.000 lines.
It could be considered as static/immutable.
You could compare it as XML representation of a database, so to say.
request.xml:
This file holds the request as provided to the REST API.
It is rather small, usually around 10 to 50 lines.
It is different for each request.
request.xslt:
This file holds the stylesheet to convert the given request.xml to an output XML.
It loads site.xml via the XSLT document() function, as it needs that data to fulfill the request.
The problem here is that loading site.xml in request.xslt takes a long time. In addition, for each request, indexes as introduced by the XSLT <xsl:key .../> directive must be rebuilt. This adds up.
So it would make sense to somehow cache site.xml, to avoid having to parse and index that file for every request.
It's important to note that multiple API requests can happen concurrently, thus it should be safe to share this cached site.xml between several ongoing XSLT transformations.
Is this possible with Saxon (Java)? How would that work?
Update 1
After some additional reflecting, I realize that maybe I should not attempt to just cache the site.xml XML file, but the request.xslt instead? This assumes that site.xml, which is loaded in request.xslt via document(), is part of that cache.
It would help if you show/tell us which API you use to run XSLT with Saxon.
As for caching the XSLT, with JAXP I think you can do that with a Templates created with newTemplates from the TransformerFactoryImpl (http://saxonica.com/html/documentation/using-xsl/embedding/jaxp-transformation.html), each time you want to run the XSLT you will to create a Transformer with newTransformer().
With the s9api API you can compile once to get an XsltExecutable (http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XsltExecutable.html) that "is immutable, and therefore thread-safe", you then have to us load() or load30() to create an XsltTransformer or Xslt30Transformer each time you need to run the code.
As for sharing a document, see http://saxonica.com/html/documentation/sourcedocs/preloading.html:
An option is available (Feature.PRE_EVALUATE_DOC_FUNCTION) to indicate
that calls to the doc() or document() functions with constant string
arguments should be evaluated when a query or stylesheet is compiled,
rather than at run-time. This option is intended for use when a
reference or lookup document is used by all queries and
transformations
The section on that configuration option, however, states:
In XSLT 3.0 a better way of having external documents pre-loaded at
stylesheet compile time is to use the new facility of static global
variables.
So in that case you could declare
<xsl:variable name="site-doc" static="yes" select="doc('site.xml')"/>
You will need to wait on Michael Kay's response as to whether that suffices to share the document.
Well, it is certainly possible, but the best way of doing it depends a little on the circumstances, e.g. what happens when site.xml changes.
I would be inclined to create a single s9api Processor at application startup, and immediately (that is, during application initialization) load site.xml into an XdmNode using Processor.DocumentBuilder.build(); this can then be passed as a parameter value (an <xsl:param>) into each transformation that uses it. Or if you prefer to access it using document(), you could register a URIResolver that responds to the document() call by returning the relevant XdmNode.
As for indexing and the key() function, so long as the xsl:key definition is "sharable", then if two transformations based on the same compiled stylesheet (s9api XsltExecutable) access the same document, the index will not be rebuilt. An xsl:key definition is shareable if its match and use attributes do not depend on anything that can vary from one transformation to another, such as the content of global variables or parameters.
Saxon's native tree implementations (unlike the DOM) are thread-safe: if you build a document once, you can access it in multiple threads. The building of indexes to support the key() function is synchronized so concurrent transformations will not interfere with each other.
Martin's suggestion of allowing compile-time evaluation of the document() call would also work. You could also put the document into a global variable defined with static="yes". This doesn't play well, however, with exporting compiled stylesheets into persistent files: there are some restrictions that apply when exporting a stylesheet that contains node-valued static variables.
I am looking for an extension of doc() functionality currently available in SAXON in a way that it will read XML not from filesystem or from http network, but from memory, where I have those xmls.
The way I want to use it is like:
mydoc('id')/root/subroot/#myattr
or
doc('mydoc://id')/root/subroot/#myattr
What I have considered so far:
use queryEvaluator.setContextItem() - does not solve my use case as I can have multiple XML sources in one query
register some own URL scheme protocol into Java - seems to me like overkill and I have never done this
write own ExtensionFunction - seems to be the right way so far, but i am confused whether I should use ExtensionFunction or rather ExtensionFunctionDefinition. Also I am littel bit confused by Doc_1 and Doc Saxonica source code as it uses Atomizer and other unknown internall stuff.
So the questions are:
Is it variant 3 the best one (in the means of simplicity) or would you recommend some other approach ?
Is it OK to use ExtensionFunction and return XdmNode from my in-memory xmls ? It seems to me it should work, but I really do not want to step into some edge cases or saxon minefield.
Any comment from experienced Saxon user will be appretiated.
The standard way of doing this is to write a URIResolver and register it with the transformer. The URIResolver is called, supplying the requested URI, and it is expected to return a Source (which can be a StreamSource, SAXSource, or DOMSource, for example). In this scenario you would typically return a StreamSource wrapping a StringReader which wraps the String containing the XML.
You could equally well use an extension function, but it's probably a little bit more complicated.
I would like the ability to provide an escape utility that can be used in an XSL Stylesheet. For example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xalan="http://xml.apache.org/xalan"
xmlns:escape="xalan://com.example.myservice.MyEscapeTool">
However, in terms of Java, my understanding is that lack of the following setting on your TransformerFactory can be insecure:
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
So I did that, but understandably this blocks your ability to use "external function calls" with the following runtime error:
FATAL: XPath syntax error at char 12 in {escape:new()}:
Cannot find a matching 0-argument function named
{java:com.example.myservice.MyEscapeTool}new(). Note: external
function calls have been disabled;
Removing the aforementioned FEATURE_SECURE_PROCESSING flag will fix the issue.
How can I include a utility function that can be called in XSLT, without causing a loss in security with the ability to expose ANY arbitrary Java class?
As #MartinHonnen points out in his comment, if you switch to using Saxon, then you can restrict the stylesheet to use only "integrated extension functions" which are registered with the XSLT processor prior to execution, without allowing the stylesheet to call any class/method that happens to be on the classpath.
We frequently use templates as a way of serializing an object. That is, given a Java POJO and a suitable XML template that includes placeholders like ${person.address.street}, we can output fully formed XML etc.
Are there any libraries where you can take that same template and a sample piece of output, and go the other way? That is, produce a populated Java bean (for instance) from a template, a sample XML document and, I guess, a Class name.
You can use Freemarker to help you it's a great JAVA API
You can have XML template file which call getters and then create a dynamic XML file.
Tell me if it's helps,
I am using SAX to parse some large XML files and I want to ask the following: The XML files have a complex structure. Something like the following:
<library>
<books>
<book>
<title></title>
<img>
<name></name>
<url></url>
</img>
...
...
</book>
...
...
</books>
<categories>
<category id="abcd">
<locations>
<location>...</location>
</locations>
<url>...</url>
</category>
...
...
</categories>
<name>...</name>
<url>...</url>
</library>
The fact is that these files are over 50MB each and a lot of tags are repeated under different context, e.g. url under /books/book/img but also under /library and under /library/categories/category and so on.
My SAX parser uses a subclass of DefaultHandler in which I override teh startElement and the endElement methods (among others). But the problem is that these methods are huge in terms of lines of code due to the business logic of these XML files. I am using a lot of
if ("url".equalsIgnoreCase(qName)) {
// peek at stack and if book is on top
// ...
// else if category is on top
// ...
} else if (....) {
}
I was wondering whether there is a more proper / correct / elegant way to perform the xml parsing.
Thank you all
What you can do is implement separate ContentHandler for different contexts. For example write one for <books>, one for <categories> and one top-level one.
Then, as soon as the books startElement method is called, you immediately switch the ContentHandler using XMLReader.setContentHandler(). Then the <books> specific ContentHandler switches back to the top-level handler to when its endElement method is called for books.
This way each ContentHandler can focus on his particular part of the XML and need not know about all the other parts.
The only ugly-ish part is that the specific handlers need to know of the top-level handler and when to switch back to it, which can be worked around by providing a simple "handler stack" that handles that for you.
Not sure whether you're asking 1) is there something else you can do besides checking the tag against a bunch of strings or 2) if there's an alternative to a long if-then-else kind of statement.
The answer to 1 is not that I've found. Someone else may tackle that one.
The answer to 2 depends on your domain. One way I see is that if the point of this is to hydrate a bunch of objects from an XML file, then you can use a factory method.
So the first factory method has the long if then else statement that simply passes off the XML to the appropriate classes. Then each of your classes has a method like constructYourselfFromXmlString. This will improve your design because only the objects themselves know about the private data that is in an XML to hydrate them.
the reason this is hard is that, if you think about it, exporting an Object to XML and importing back in really violates encapsulation. Nothing to be done about it, just is. This at least makes things a little more encapsulated.
HTH
Agreeing with the sentiment that exporting an object to XML is a violation of encapsulation, the actual technique used to handle tags which are nested at different lengths isn't terribly difficult using SAX.
Basically, keep a StringBuffer which will maintain your "location" in the document, which will be a directory like representation of the nested tag you are currently within. For example, if at the moment the string buffer's contents are /library/book/img/url then you know it's an URL for an image in a book, and not a URL for some category.
Once you ensure that your "path tracking" algorithms are correct you can then wrap your object creation routines with better handling by using string matches. Instead of
if ("url".equalsIgnoreCase(qName)) {
...
}
you can now substitute
if (location.equalsIgnoreCase("/library/book/img/url")) {
...
}
If for some reason this doesn't appeal to you, there are still other solutions. For example, you can make a SAX handler which implements a stack of Handlers where the top handler is responsible for handling just it's portion of the XML document, and it pops itself off the stack once it is done. Using such a scheme, the each object gets created by its own unique individual handler, and some handlers basically check and direct which "object creation" handlers get shoved onto the handling stack at the appropriate times.
I've used both techniques. There are strengths in both, and which one is best really depends on the input and the needed objects.
You could refactor your SAX content handling so that you register a set of rules, each of which has a test that it applies to see if it matches the element, and an action that is executed if it does. This is moving closer to the XSLT processing model, while still doing streamed processing. Or you could move to XSLT - processing 50Mb input files is well within the capabilities of a modern XSLT processor.
try SAX-JAVA Binding Made Easier