Get the count of attributes using xpath in java without dom parser - java

I want to fetch the count of attributes using xpath in java. I know we can use DOM parsers but my input file is going to be very large. I can't really use SAX as there are multiple nested tags I need to take care of. I'm also not sure what all attributes are going to be inside the xml document. Having xpath would make my life easier but im really worried dom parser will choke the memory. I read about s9 apis but coudn't really solve it. Are there any other alternate libraries in JAVA that uses xpath without DOM parser? Sharing examples would be really helpful
Lets say my input is
<?xml version="1.0" encoding="UTF-8"?>
<cricketers>
<continent>
<team>
<aussies>
<cricketer type="righty">
<name>Smith</name>
<role>Captain</role>
<position>Wicket-Keeper</position>
</cricketer>
<cricketer type="lefty">
<name>Warner</name>
<role>Batsman</role>
<position>Point</position>
</cricketer>
</aussies>
</team>
</continent>
<continent>
<team>
<england>
<cricketer type="righty">
<name>Morgan</name>
<role>Captain</role>
<position>Covers</position>
</cricketer>
<cricketer type="lefty">
<name>Cook</name>
<role>Batsman</role>
<position>Point</position>
</cricketer>
</england>
</team>
</continent>
<continent>
<team>
<aussies>
<cricketer type="righty">
<name>Smith</name>
<role>Captain</role>
<position>Wicket-Keeper</position>
</cricketer>
<cricketer type="lefty">
<name>Warner</name>
<role>Batsman</role>
<position>Point</position>
</cricketer>
</aussies>
</team>
</continent>
</cricketers>
Given an xpath //team/aussies/cricketer, the count is 4 in this case.
I want to implement something like this without DOM parser

With XSLT 3 supporting streaming (e.g. with Saxon EE 10 or 9.9) you can use
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:output method="adaptive"/>
<xsl:mode streamable="yes"/>
<xsl:template match="/">
<xsl:sequence select="count(//#*)"/>
</xsl:template>
</xsl:stylesheet>
if the task is really only to count all attributes. Saxon should run that in a single, forwards only parse through the whole document without building a full tree of all nodes.
Counting elements selected without predicates doing child selection, like
<xsl:template match="/">
<xsl:sequence select="count(//team/aussies/cricketer)"/>
</xsl:template>
should also work.
In the s9api, you simply need to make sure you pass in the input document as a stream to the Xslt30Transformer e.g.
Processor processor = new Processor(true);
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable executable = compiler.compile(new StreamSource("count-example1.xsl"));
Xslt30Transformer transformer = executable.load30();
XdmValue result = transformer.applyTemplates(new StreamSource("sample1.xml"));
System.out.println(result);

Related

Differentiate XMLs based on namespace in Apache Camel

I am using Spring Boot and Apache Camel in my project. The architecture is some XML is coming from an input queue to Camel layer where it is transformed to another XML using XSLT and them the final XML is sent to an output queue.The XML which is coming is of the following form
<tns:Standalone xmlns:tns="namespace1">
<tns:name>Test</tns:name>
</tns:Standalone>
and this is correctly getting transformed using an XSLT. The problem is in my flow, the tns of the incoming XML can vary (say a different XML can come with tns as namespace2). Then the XSLT is failing. So I need to have logic to use differentiate the incoming XMLs based on tns valueso that I can use different XSLTs for both the scanarios. Can you please guide me how can I differentiate input XMLs based on tns?
Here's a simple example showing how you can use a single XSLT to handle equally nodes in two different namespaces:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns1="namespace1"
xmlns:ns2="namespace2"
exclude-result-prefixes="ns1 ns2">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/ns1:Standalone | /ns2:Standalone">
<output>
<xsl:value-of select="ns1:name | ns2:name"/>
</output>
</xsl:template>
</xsl:stylesheet>
When this stylesheet is applied to either one of the following inputs:
XML 1
<tns:Standalone xmlns:tns="namespace1">
<tns:name>Test</tns:name>
</tns:Standalone>
XML 2
<tns:Standalone xmlns:tns="namespace2">
<tns:name>Test</tns:name>
</tns:Standalone>
the result will be:
Result
<?xml version="1.0" encoding="UTF-8"?>
<output>Test</output>

ArrayIndexOutOfBoundsException when transforming XML using XSLT

I am using the Java javax.xml.transform library in my Scala Play application to perform a simple XSLT transformation on some XML. I am trying to remove the namespace from one of the elements, but I am getting an exception when I POST XML to the endpoint which does the transformation.
The method I have written to do the transformation is below:
def transformXml(xml: String, xslName: String): Try[String] = {
Try {
// Create transformer factory
val factory: TransformerFactory = TransformerFactory.newInstance()
// Use the factory to create a template containing the xsl file
val template: Templates = factory.newTemplates(new StreamSource(new FileInputStream(s"app/xsl/$xslName.xsl")))
// Use the template to create a transformer
val xformer: Transformer = template.newTransformer()
// Prepare the input for transformation
val input: Source = new StreamSource(new StringReader(xml))
// Prepare the output for transformation result
val outputBuffer: Writer = new StringWriter
val output: javax.xml.transform.Result = new StreamResult(outputBuffer)
// Apply the xslt transformation to the input and store the result in the output
xformer.transform(input, output)
// Return the transformed XML
outputBuffer.toString
}
}
Through putting printlns in my code, I have deduced that it is in fact failing at the xformer.transform(input, output) line. The XML I am passing in and the XSL file I am using to transform are below:
<?xml version="1.0"?>
<Message xmlns="http://foo.bar" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">
<EnvelopeVersion>2.0</EnvelopeVersion>
<Header>
<MessageDetails>
...
...
...
</MessageDetails>
<SenderDetails/>
</Header>
<OtherDetails>
<Keys/>
</OtherDetails>
<Body>
</Body>
</Message>
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" exclude-result-prefixes="">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:param name="ancestralNamespace" select="namespace-uri(/*[1])"/>
<xsl:copy>
<xsl:apply-templates select="#*|node()">
<xsl:with-param name="ancestralNamespace" select="$ancestralNamespace"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="*[contains(namespace-uri(),'foo.bar')]">
<xsl:param name="ancestralNamespace" select="namespace-uri(..)"/>
<xsl:element name="{local-name()}" namespace="">
<xsl:apply-templates select="#*|node()">
<xsl:with-param name="ancestralNamespace" select="$ancestralNamespace"/>
</xsl:apply-templates>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
My expected output is this:
<?xml version="1.0"?>
<Message>
<EnvelopeVersion>2.0</EnvelopeVersion>
<Header>
<MessageDetails>
...
...
...
</MessageDetails>
<SenderDetails/>
</Header>
<OtherDetails>
<Keys/>
</OtherDetails>
<Body>
</Body>
</Message>
The error I get back from sending a POST request to my endpoint is this:
{
"statusCode": 500,
"message": "javax.xml.transform.TransformerException: java.lang.ArrayIndexOutOfBoundsException: -1"
}
I do not have much experience with XSLT and have inherited this code from someone else to try to debug, so if anyone with XML/XSLT experience could give me some help I would greatly appreciate it. The perplexing thing is that the person I got this problem from had written Unit Tests using this method (send in my example XML and get out the expected XML) and they passed so I don't know where to look next.
Right so after a few hours of debugging and fretting over this, I found the solution!
The default transformer which my Play application was using handles XSLT differently, and was getting confused at the line <xsl:param name="ancestralNamespace" select="namespace-uri(/*[1])"/>. What solved my issue was to use a different transformer. The one I found to work was Xalan (version 2.7.2), and after importing that into my project build file I hit the endpoint and the transformation was successful.
To import the version I found to work, add the following to your build:
"xalan" % "xalan" % "2.7.2" % "runtime"
I believe that the "runtime" section is the most important part, as it seems to overwrite what the application would normally use. I would guess that the reason my tests passed but my endpoint failed is that Scala Test runs with different configuration to runtime. Nothing else about my code had to be changed.
I hope this helps to stop anyone else from encountering this (admittedly rather unique) error! I ended up trawling through countless forums from as far back as 2002 before resorting to trying a different runtime configuration.

Accessing unparsed entities in XSLT with a SAXTransformerFactory and TransformerHandlers

I have some trouble while retrieving unparsed entity URIs, with the XPath function unparsed-entity-uri().
I'm using a SAXTransformerFactory like in "Efficient XSLT pipeline in Java" question, because I need to perform a transformations chain (i.e. apply several XSLT transformations, and use the result of a transformation as input for the second transformation).
I discovered I'm unable to retrieve unparsed entity thank to the code below. Actually it works well with Xalan, but not with Saxon-HE (version 9.7.0) - but I need Saxon because I'd rather XSLT 2.0 (even if in the code below there's nothing specific to XSLT 2, it's only for the sake of providing an example). It also works with Saxon if I don't use a TransformerHandler, e.g. stf.newTransformer(new StreamSource("transfo.xsl")).transform(new StreamSource("input.xsl"), new StreamResult(System.out)) will produce the desired output.
Is there a configuration step that I forgot?
// use "org.apache.xalan.processor.TransformerFactoryImpl" for Xalan
String transformerFactoryClassName = "net.sf.saxon.TransformerFactoryImpl";
SAXTransformerFactory stf = (SAXTransformerFactory) TransformerFactory.newInstance(transformerFactoryClassName,
LaunchSimpleTransformationUnparsedEntities.class.getClassLoader());
try {
TransformerHandler thTransf = stf
.newTransformerHandler(new StreamSource("transfo.xsl"));
// output the result in console
thTransf.setResult(new StreamResult(System.out));
// Launch transformation of input.xml
Transformer t = stf.newTransformer();
t.transform(new StreamSource("input.xml"),
new SAXResult(thTransf));
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
}
In input, I have (for input.xml):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE book
[<!ENTITY cover_hadrien SYSTEM "images/covers/cover_hadrien.jpg" NDATA jpeg>]>
<book>
<title>Les mémoires d'Hadrien</title>
<author>Marguerite Yourcenar</author>
<cover imgref="cover_hadrien" />
</book>
and a sample XSLT (for transfo.xsl):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="cover">
<xsl:copy>
<xsl:value-of select="unparsed-entity-uri(#imgref)"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
as a result, I would expect something like:
<?xml version="1.0" encoding="UTF-8"?><book>
<title>Les mémoires d'Hadrien</title>
<author>Marguerite Yourcenar</author>
<cover>images/covers/cover_hadrien.jpg</cover>
</book>
but <cover> is empty when performing the transformation with Saxon.
Interesting observation. The issue in fact is not with Saxon's TransformerHandler, but rather with the "identity transformer" obtained using SAXTransformerFactory.newTransformer(): the identity transformer is not passing unparsed entities down the line. This is essentially because Saxon's identity transformer is reusing parts of the XSLT engine, and XSLT does not provide any way for a transformation to output unparsed entities in the result. If you sent the SAX parser output directly to the TransformerHandler, rather than going via an identity transformer, then I think it would all work.
As with all things JAXP-related, the specification of SAXTransformerFactory.newTransformer() is infuriatingly vague. All it says is that the returned Transformer performs a copy of the Source to the Result. i.e. the "identity transform". What exactly counts as a copy? I think Saxon's interpretation has been that it is equivalent to the effect of doing an XSLT identity transform - which would lose unparsed entities (as well as other things like CDATA sections, the DTD, etc).
Incidentally XSLT 2.0 specifies that the result of unparsed-entity-uri() should be an absolute URI (XSLT 1.0 doesn't say anything on the subject) so even if this is fixed, the Saxon output will be different.
Entered as a Saxon issue here: https://saxonica.plan.io/issues/3201 I think we need to be a bit careful about passing unparsed entities to a SAXResult if we don't pass all the other events expected by a SAX DTDHandler - and we're certainly not going to change the Saxon identity transformer to retain things (like DTD declarations) that aren't modelled in XDM.
Indeed, following #MichaelKay's details, launching the transformation that way works properly:
// launch transformation of input.xml
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setContentHandler(thTransf);
reader.setDTDHandler(thTransf);
reader.parse(new InputSource(input.xml"));
(this will replace the following line:
// Launch transformation of input.xml
Transformer t = stf.newTransformer();
t.transform(new StreamSource("input.xml"),
new SAXResult(thTransf));
that were used initially).

How to handle duplicate node names when converting xml to csv using java and xsl

I am given an xml file from an outside source (so I have no control over the attribute names) and unfortunately they use the same name for a paired set of data. I can't seem to figure out how to access the second value. An example of the data in the xml file is:
<?xml version="1.0"?>
<addressResponse>
<results>
<ownerName>Name1</ownerName>
<houseAddress>House1</houseAddress>
<houseAddress>CityState1</houseAddress>
<yearBuilt>Year1</yearBuilt>
</results>
<results>
<ownerName>Name2</ownerName>
<houseAddress>House2</houseAddress>
<houseAddress>CityState2</houseAddress>
<yearBuilt>Year2</yearBuilt>
</results>
</addressResponse>
I already have my java code together and can parse the xml but I need help handling the duplicate attribute name. I want my csv file to look like the following:
owner,address,citystate,yearbuilt
Name1,House1,CityState1,Year1
Name2,House2,CityState2,Year2
In my xsl file, I did the following "hoping" it would get the second houseAddress but it didn't:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">owner,address,citystate,yearbuilt
<xsl:for-each select="//results>
<xsl:value-of select="concat(ownerName,',',houseAddress,',',houseAddress,',',yearBuilt,'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
That gave me:
owner,address,citystate,yearbuilt
Name1,House1,House1,Year1
Name2,House2,House2,Year2
Is there a trick to do this? I can't get the attribute names changed from the originator so I'm stuck with them. Thank you in advance.
Use:
houseAddress[2]
to get the value of the second occurrence of the houseAddress element.
Note that we are assuming XSLT 1.0 here.

Creating a word document from a template dynamically using values from java objects

I want to create a word document from an HTML page.
I am planning to get the values on the HTML page and then pass these values to a document template.
I have used JSOUP to parse the contents of the HTML page and I get the values in my java program. I now want to pass these values to a word document template.
I want to know what are the best techniques I can use to create the document template and pass the values to the template to create the word document.
Thank You.
I found something very Interesting and simple. We just need to create a simple .xml template for the document we want to create and then programmatically change the contents of the xml file and save it as a ms word document.
You can find the xml template and the code here.
i suggest you use xslt, because your data is already in xml-format and there are well defined xml-formats from microsoft.
You could write a document template with word and save it in xml-format. Then you can convert the word-xml to a xsl-template with your html-xml as input. After the xslt-transformation you have a valid word-xml with your dynamic values from the html-xml.
XSLT example for excel
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" encoding="UTF-8" omit-xml-declaration="no" />
<xsl:template match="/">
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
...
<xsl:for-each
select="/yourroot/person">
...
<Cell ss:StyleID="uf">
<Data ss:Type="String">
<xsl:value-of
select="#Name" />
</Data>
</Cell>
..
</xsl:for-each>
...
</xsl:template>
</xsl:stylesheet>
JODReports and Docmosis might also be useful options for you since there is template populate and Doc output. If DOCX is your real target, then you can write out the document yourself since the XML is published - but that is a lot of work.

Categories

Resources