Parsing XML with structured element names - java

I've got some third party XML to parse in the following form. The number of tests is unbounded, but always an integer.
<tests>
<test_1>
<foo bar="baz" />
</test_1>
<test_2>
<foo bar="baz" />
</test_2>
<test_3>
<foo bar="baz" />
</test_3>
</tests>
I'm currently parsing this with XPath, but it's a lot of messing around. Is there any way of expressing this style of XML in a XSD schema and generating JAXB classes from it.
As far as I can see this is impossible, the only thing possible is the <xs:any processContents="lax"/> technique from
how can I define an xsd file that allows unknown (wildcard) elements?
, however this allows any content, not specifically <test_<integer>. I just want to confirm I'm not missing some XSD/JAXB trick?
Note I would have preferred the XML to be structured like this. I may try to convince the third-party to change.
<tests>
<test id="1">
<foo bar="baz" />
</test>
<test id="2">
<foo bar="baz" />
</test>
<test id="3">
<foo bar="baz" />
</test>
</tests>

While there are ways of dealing with elements with structured names such as numeric suffixes,
XPath: Use string tests against name() or local-name()
XSD: See XSD element name pattern matching
JAXB: See Dealing with poorly designed XML with JAXB
you really should fix the underlying XML design (test_1 should be test) instead.

For completeness here is full working example of using XSLT to transform the <test_N> input into <test id="N"> style
<tests>
<test_1>
<foo bar="baz" />
</test_1>
<test_2>
<foo bar="baz" />
</test_2>
<test_1234>
<foo bar="baz" />
</test_1234>
<other>
<foo></foo>
</other>
</tests>
XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="*[substring(name(), 1, 5) = 'test_']">
<xsl:element name="test">
<xsl:attribute name="id"><xsl:value-of select="substring(name(), 6, string-length(name()) - 5)" /></xsl:attribute>
<xsl:copy-of select="node()" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Code
File input = new File("test.xml");
File stylesheet = new File("test.xsl");
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance().newTransformer(stylesource);
StringWriter writer = new StringWriter();
transformer.transform(new StreamSource(input), new StreamResult(writer));
System.out.println(writer);
Output
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<test id="1">
<foo bar="baz"/>
</test>
<test id="2">
<foo bar="baz"/>
</test>
<test id="1234">
<foo bar="baz"/>
</test>
<other>
<foo/>
</other>
</tests>

Related

Unable to fetch the specific entire XML tag by using XSLT

I have the below sample XML file
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<testng-results skipped="0" failed="0" total="10" passed="10">
<class name="com.transfermoney.Transfer">
<test-method status="PASS" name="setParameter" is-config="true" duration-ms="4"
started-at="2018-08-16T21:43:38Z" finished-at="2018-08-16T21:43:38Z">
<params>
<param index="0">
<value>
<![CDATA[org.testng.TestRunner#31c2affc]]>
</value>
</param>
</params>
<reporter-output>
</reporter-output>
</test-method> <!-- setParameter -->
</class>
<class name="com.transfermoney.Transfer">
<test-method status="FAIL" name="setSettlementFlag" is-config="true" duration-ms="5"
started-at="2018-08-16T21:44:55Z" finished-at="2018-08-16T21:44:55Z">
<reporter-output>
<line>
<![CDATA[runSettlement Value Set :false]]>
</line>
</reporter-output>
</test-method> setSettlementFlag
</class>
</testng-results>
I just want to take the below piece of tags from above XML file based on status PASS (I don't want to take <?XML version, <testng-results> and class tags those are should be ignored).
Expected Output:
<test-method status="PASS" name="setParameter" is-config="true" duration-ms="4"
started-at="2018-08-16T21:43:38Z" finished-at="2018-08-16T21:43:38Z">
<params>
<param index="0">
<value>
<![CDATA[org.testng.TestRunner#31c2affc]]>
</value>
</param>
</params>
<reporter-output>
</reporter-output>
</test-method>
I just used below XSLT to get the above output from sample XML file but It doesn't work It returned all the tags but I just want the above output not other than anything.
XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="xml"/>"
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="class"/>
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
<xsl:for-each select="test-method[#status='PASS']">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:stylesheet>
Also Using the below java code to run the XSLT and sample XML file
Code:
String XML = fetchDataFrmXML(".//Test//testng-results_2.xml");
Transformer t = TransformerFactory.newInstance().newTransformer(new StreamSource(new StringReader(XSL)));
t.transform(new StreamSource(new StringReader(XML)), new StreamResult(new File(".//Test//Sample1.xml")));
This is the sample payload. But the actual payload had multiple nodes with "PASS" and "Failed" status. I'm just only interested to fetch the PASS node in the above output format.
Any leads....
The result you show could be obtained quite simply by doing just:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/testng-results">
<xsl:copy-of select="class/test-method[#status='PASS']" />
</xsl:template>
</xsl:stylesheet>
However, in case of more than one test-method having a status of "PASS" this will result in an XML fragment with no single root element. So you'd probably be better off doing:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/testng-results">
<root>
<xsl:copy-of select="class/test-method[#status='PASS']" />
</root>
</xsl:template>
</xsl:stylesheet>

Unmarshalling without unique node names

I am stuck trying to figure out how to unmarshall an XML file supplied by IBM Cognos.
The structure does not provide unique names for the different child nodes under the element but there is a block of metadata that defines the order of the values.
This is a simplified sample of the XML file.
<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
xmlns="http://developer.cognos.com/schemas/xmldata/1/"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://developer.cognos.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
<metadata>
<item name="EmployeeID" type="xs:string" length="20"/>
<item name="firstName" type="xs:string" length="50"/>
<item name="lastName" type="xs:string" length="50"/>
</metadata>
<data>
<row>
<value>EMP1</value>
<value>Joe</value>
<value>Blogs</value>
</row>
<row>
<value>EMP2</value>
<value>Mary</value>
<value>Soap</value>
</row>
</data>
</dataset>
I'm using Spring OXM and Castor for this project and I have no control over the XML format as I am pulling it via a web service from a third party system.
Update : I'm not adverse to swapping out Castor for a different marshalling/unmarshalling library.
The magic of XSLT to the rescue. By running the provided XML through the following XSLT stylesheet I was able to create an XML file that I could then unmarshall correctly.
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:cognos="http://developer.cognos.com/schemas/xmldata/1/">
<xsl:output method="xml" version="1.0" encoding="UTF-8" standalone="yes" indent="yes"/>
<xsl:template match="/">
<xsl:element name="DataSet">
<xsl:for-each select="//*[name()='row']">
<xsl:variable name="row" select="position()" />
<xsl:element name="Row">
<xsl:for-each select="//*[name()='item']">
<xsl:variable name="elementName" select="#name" />
<xsl:variable name="index" select="position()" />
<xsl:element name="{translate($elementName,' ','_')}">
<xsl:value-of select="//cognos:row[$row]/cognos:value[$index]" />
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
This transformed the XML file as follows
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DataSet>
<Row>
<EmployeeID>EMP1</EmployeeID>
<firstName>Joe</firstname>
<lastName>Blogs</lastName>
</Row>
<Row>
<EmployeeID>EMP2</EmployeeID>
<firstName>Mary</firstname>
<lastName>Soap</lastName>
</Row>
</DataSet>

XSLT: set name of transformed output file

I am using Apache Camel file component and xslt component. I have a route where i pickup a xml message, transform using xslt and drop to a different folder.
Apache camel DSL route:
<route id="normal-route">
<from uri="file:{{inputfilefolder}}?consumer.delay=5000" />
<to uri="xslt:stylesheets/simpletransform.xsl transformerFactoryClass=net.sf.saxon.TransformerFactoryImpl" />
<to uri="file:{{outputfilefolder}}" />
</route>
I am mentioning Apache camel also here , to check if there is a way to set the output file name using Camel. I think, even without Camel, there would be a mechanism with pure XSLT.
I need to rename the transformed output file. But always i am getting the same input filename with the transformed content, in the output folder.
eg: input file: books.xml
output file: books.xml [with the transformation applied]
What i am looking for is someotherfilename.xml as the output filename. The output data is correct.
I tried <xsl:result-document href="{title}.xml"> , but then the output xml is blank. Please help.
Input XML file:
<?xml version="1.0" encoding="UTF-8"?>
<books>
<book.child.1>
<title>Charithram</title>
<author>P Sudarsanan</author>
</book.child.1>
<book.child.2>
<title>Java Concurrency</title>
<author>Joshua Bloch</author>
</book.child.2>
</books>
XSLT:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes" />
<xsl:variable name="filename" select="'newfilename'" />
<xsl:template match="/">
<xsl:result-document href="{$filename}.xml">
<traders>
<xsl:for-each select="books/*">
<trade>
<title>
<xsl:value-of select="title" />
</title>
</trade>
</xsl:for-each>
</traders>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
Output XML when using <xsl:result-document href="" in XSLT
it is blank..
Output XML when not using <xsl:result-document href="" in XSLT
<?xml version="1.0" encoding="UTF-8"?>
<traders xmlns:xs="http://www.w3.org/2001/XMLSchema">
<trade>
<title>Charithram</title>
</trade>
<trade>
<title>Java Concurrency</title>
</trade>
</traders>
Edit: edited the XSLT as per MartinHonnen's comment
Looks like Camel's default is to use the same file name, but you can override it. As the docs mention you can specify the options of interest as follows:
file:directoryName[?options]
One such option is fileName:
Use Expression such as File Language to dynamically set the filename.
For consumers, it's used as a filename filter. For producers, it's
used to evaluate the filename to write.
In short, modify your route as follows:
<route id="normal-route">
<from uri="file:{{inputfilefolder}}?consumer.delay=5000" />
<to uri="xslt:stylesheets/simpletransform.xsl transformerFactoryClass=net.sf.saxon.TransformerFactoryImpl" />
<to uri="file:{{outputfilefolder}}?fileName=foo.xml" />
</route>
Where foo.xml will be the output file.
Update
You can use Simple or File language to set file names dynamically. There are a few examples in the links.

convert xml file into csv format sing camel?

i want to create .xml file into csv file using camel. here is my code
CamelContext context = new DefaultCamelContext();
from("file://Input?fileName=test.xml").marshal().csv().to("file://test?fileName=test.csv");
context.start();
But its't creating any file in desired folder "test".
Please spend just a bit more time on the Camel docs, and try out the examples, and read the FAQ. And the introduction articles and whatnot.
The code above isn't even valid, as you would need to put it inside a RouteBuilder.
Also when you start CamelContext, read the javadoc of the start method. And read this FAQ
http://camel.apache.org/running-camel-standalone-and-have-it-keep-running.html
Also Camel offers a tracer so you can see the message flow as the messages are being processed. The tracer will be default log this at INFO level to the logger.
http://camel.apache.org/tracer
Here is a sample using camel
<camelContext xmlns="http://camel.apache.org/schema/spring">
<route>
<from uri="file:src/xmldata?noop=true"/>
<to uri="xslt:file:src/main/fruits.xslt"/>
<to uri="file://TESTOUT?fileName=output.csv"/>
</route>
</camelContext>
sample xml file in src/xmldata folder
<AllFruits xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- All fruits below. -->
<Fruit>
<FruitId>Bannana</FruitId>
<FruitColor>Yellow</FruitColor>
<FruitShape>Moon</FruitShape>
<Customer>
<Name>Joe</Name>
<NumberEaten>5</NumberEaten>
<Weight>2.6</Weight>
</Customer>
<Customer>
<Name>Mark</Name>
<NumberEaten>8</NumberEaten>
<Weight>5.0</Weight>
</Customer>
</Fruit>
</AllFruits>
src/main/fruits.xslt
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="text" encoding="ISO-8859-1" />
<xsl:variable name="newline" select="'
'"/>
<xsl:template match="Fruit">
<xsl:for-each select="Customer">
<xsl:value-of select="preceding-sibling::FruitId" />
<xsl:text>,</xsl:text>
<xsl:value-of select="NumberEaten" />
<xsl:text>,</xsl:text>
<xsl:value-of select="Weight" />
<xsl:value-of select="$newline" />
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Using XSLT to output multiple files

I'm trying to get an example that I found for using XSLT 2.0 to output multiple files working.
Using Saxon B 9.7.0.1 with Java 1.6, I get this error:
C:\Documents and Settings\Administrator\Desktop\saxon>java -jar saxon9.jar -s:input.xml -xsl:transform.xml
Error on line 15 of transform.xml:
java.net.URISyntaxException: Illegal character in path at index 20: file:///C:/Documents
and Settings/Administrator/Desktop/saxon/output1/test1.html
at xsl:for-each (file:/C:/Documents%20and%20Settings/Administrator/Desktop/saxon/transform.xml#10)
processing /tests/testrun[1]
Transformation failed: Run-time errors were reported
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<testrun run="test1">
<test name="foo" pass="true" />
<test name="bar" pass="true" />
<test name="baz" pass="true" />
</testrun>
<testrun run="test2">
<test name="foo" pass="true" />
<test name="bar" pass="false" />
<test name="baz" pass="false" />
</testrun>
<testrun run="test3">
<test name="foo" pass="false" />
<test name="bar" pass="true" />
<test name="baz" pass="false" />
</testrun>
</tests>
transform.xml
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="text"/>
<xsl:output method="html" indent="yes" name="html"/>
<xsl:template match="/">
<xsl:for-each select="//testrun">
<xsl:variable name="filename"
select="concat('output1/',#run,'.html')" />
<xsl:value-of select="$filename" /> <!-- Creating -->
<xsl:result-document href="{$filename}" format="html">
<html><body>
<xsl:value-of select="#run"/>
</body></html>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Character 20 in your URI is the first space in "Documents and Settings". As a quick fix, try moving the files to a path without spaces. (Say, "C:\test" or some such.) I suspect the long-term fix is to change your XSLT to encode spaces to %20 before feeding $filename to xsl:result-document, but I'm afraid my XSLT-2.0-fu isn't strong enough to tell you how.
Edit: I haven't tested this, as I don't have an XSLT 2.0 processor handy, but after glancing at the docs, it looks like you want the encode-for-uri function. Something like the following may work for you:
<xsl:result-document href="{fn:encode-for-uri($filename)}" format="html">
I had the same issue with saxon -o: outputfile replacing the spaces with %20..
found out the issue is saxon and java versions.
Linux JAVA 1.7.0_45 : Saxon creates %20
Unix JAVA 1.5.0_61 : SAXON creates %20
Unix JAVA 1.4.2_22 : SAXON Does Not creates %20 directory

Categories

Resources