XML to XLS using java - java

How can I map the metadata to data. For example I only want LastName and Email from the xml file into the xls file. How can I select LastName and email from xml file and convert that into two column XLS file columns being Lastname and email. Thank you
XML Document
<root>
<metadata>
<item name="Last Name" type="xs:string" length="182"/>
<item name="First Name" type="xs:string" length="182"/>
<item name="Class Registration #" type="xs:decimal" precision="19"/>
<item name="Email" type="xs:string" length="422"/>
<item name="SacLink ID" type="xs:string" length="92"/>
<item name="Term Desc" type="xs:string" length="62"/>
<item name="Status Code" type="xs:string" length="6"/>
</metadata>
<data>
<row>
<value>XXX</value>
<value>xxxx</value>
<value>xxx</value>
<value>xxx</value>
<value>xxx</value>
<value>xx</value>
<value>xx</value>
</row>
<row>
<value>xxy</value>
<value>xx</value>
<value>xx</value>
<value>xx</value>
<value>xx</value>
<value>xx</value>
<value>xx</value>
</row>
</data>
</root>

You might use an XSL transform for this, outputting in CSV format which is loadable by Excel.
If you want to write a program using C# 4.0 and Office 2008/10, it's also easier than ever to leverage the interop capabilities - have a look at the C# Samples, in the office samples.

This stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel">
<xsl:param name="pColumnNames" select="'Last Name,Email'"/>
<xsl:key name="kItemByPosition" match="item"
use="count(preceding-sibling::item)+1"/>
<xsl:template match="/">
<xsl:processing-instruction name="mso-application">
<xsl:text>progid="Excel.Sheet"</xsl:text>
</xsl:processing-instruction>
<Workbook>
<Worksheet ss:Name="Email Table">
<Table x:FullColumns="1" x:FullRows="1">
<xsl:apply-templates/>
</Table>
</Worksheet>
</Workbook>
</xsl:template>
<xsl:template match="metadata|row">
<Row>
<xsl:apply-templates/>
</Row>
</xsl:template>
<xsl:template match="item|value">
<xsl:if test="contains(concat(',',$pColumnNames,','),
concat(',',key('kItemByPosition',
position())/#name,','))">
<Cell>
<Data ss:Type="String">
<xsl:apply-templates select="#name|node()"/>
</Data>
</Cell>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output:
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel">
<Worksheet ss:Name="Email Table">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell>
<Data ss:Type="String">Last Name</Data>
</Cell>
<Cell>
<Data ss:Type="String">Email</Data>
</Cell>
</Row>
<Row>
<Cell>
<Data ss:Type="String">XXX</Data>
</Cell>
<Cell>
<Data ss:Type="String">xxx</Data>
</Cell>
</Row>
<Row>
<Cell>
<Data ss:Type="String">xxy</Data>
</Cell>
<Cell>
<Data ss:Type="String">xx</Data>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

Related

Moving elements of one XML node to a child node using xslt

Basically I need to move the contents of OrderRelease and Order into the OrderLine level ussing xsl.
Example input XML:
<OrderRelease EnterpriseCode="BRD" ReleaseNo="1234ABC" DocumentType="0001" SellerOrganizationCode="BU1" ShipNode="US1">
<Order OrderDate="2019-06-13T09:27:36-04:00" Action="CANCEL" OrderNo="1234ABC">
<Extn ExtnWMSOrderNumber="123ADS"/>
</Order>
<OrderLines>
<OrderLine Action="CANCEL" PrimeLineNo="1" SubLineNo="1" OrderedQty="5">
<Item ItemID="A" UnitOfMeasure="STD" ProductClass="NEW"/>
</OrderLine>
<OrderLine Action="" PrimeLineNo="2" SubLineNo="1" OrderedQty="10">
<Item ItemID="B" UnitOfMeasure="STD" ProductClass="NEW"/>
</OrderLine>
<OrderLine Action="CANCEL" PrimeLineNo="3" SubLineNo="1" OrderedQty="0">
<Item ItemID="C" UnitOfMeasure="STD" ProductClass="NEW"/>
</OrderLine>
</OrderLines>
</OrderRelease>
Example of XML post translation:
<OrderLines>
<OrderLine Action="CANCEL" PrimeLineNo="1" SubLineNo="1" OrderedQty="5" OrderDate="2019-06-13T09:27:36-04:00" OrderNo="1234ABC" EnterpriseCode="BRD" ReleaseNo="1234ABC" DocumentType="0001" SellerOrganizationCode="BU1" ShipNode="US1" AggregatorOrderId=”12345 >
<Item ItemID="A" UnitOfMeasure="STD" ProductClass="NEW"/>
</OrderLine>
<OrderLine Action="" PrimeLineNo="1" SubLineNo="1" OrderedQty="10" OrderDate="2019-06-13T09:27:36-04:00" OrderNo="1234ABC" EnterpriseCode="BRD" ReleaseNo="1234ABC" DocumentType="0001" SellerOrganizationCode="BU1" ShipNode="US1" AggregatorOrderId=”12345>
<Item ItemID="B" UnitOfMeasure="STD" ProductClass="NEW"/>
</OrderLine>
<OrderLine Action="CANCEL" PrimeLineNo="1" SubLineNo="1" OrderedQty="0" OrderDate="2019-06-13T09:27:36-04:00" OrderNo="1234ABC" EnterpriseCode="BRD" ReleaseNo="1234ABC" DocumentType="0001" SellerOrganizationCode="BU1" ShipNode="US1 AggregatorOrderId=”12345">
<Item ItemID="C" UnitOfMeasure="STD" ProductClass="NEW"/>
</OrderLine>
</OrderLines>
I used the following XSL but it removes all of the original OrderLine elements and also doesn't seem to do it for each OrderLine
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" />
<xsl:strip-space elements="*" />
<xsl:template match="OrderRelease">
<OrderLine>
<xsl:copy-of select="#*|OrderLine/#*" />
<xsl:apply-templates /><!-- optional -->
</OrderLine>
</xsl:template>
<xsl:template match="Order">
<xsl:copy-of select="#*|OrderLine/#*" />
<xsl:apply-templates /><!-- optional -->
</xsl:template>
<xsl:template match="Extn">
<xsl:copy-of select="#*|OrderLine/#*" />
<xsl:apply-templates /><!-- optional -->
</xsl:template>
<xsl:template match="OrderLine">
<xsl:copy-of select="#*|OrderLine/#*" />
</xsl:template>
</xsl:stylesheet>
Output of this is
<OrderLine EnterpriseCode="BRD"
ReleaseNo="1234ABC"
DocumentType="0001"
SellerOrganizationCode="BU1"
ShipNode="US1"
OrderDate="2019-06-13T09:27:36-04:00"
Action="CANCEL"
OrderNo="1234ABC"
ExtnWMSOrderNumber="123ADS"
PrimeLineNo="3"
SubLineNo="1"
OrderedQty="0"/>
You could do it this way.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="OrderLines">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="OrderLine">
<xsl:copy>
<xsl:copy-of select="#*"/> <!-- OrderLine attributes -->
<xsl:copy-of select="../../#*"/> <!-- OrderRelease attributes -->
<xsl:copy-of select="../../Order/#*"/> <!-- Order attributes -->
<xsl:attribute name="AggregatorOrderId">12345</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="Item">
<xsl:copy>
<xsl:copy-of select="#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
See it working here : https://xsltfiddle.liberty-development.net/93nwMog

XML remove all child node from a node and create new one

I have this xml file:
<achievements>
<achievement id="1" name="Monster Slayer Lv1" description="Slay 15 monsters of any type" icon="Icon.skill0496" categoryId="1">
<conditions>
<condition name="Level" val="50" />
</conditions>
<items>
<item id="6393" count="1" />
</items>
</achievement>
<achievement id="2" name="Monster Slayer Lv2" description="Slay 50 monsters of any type" icon="Icon.skill0497" categoryId="1">
<conditions>
<condition name="MonsterKill" val="50" />
</conditions>
<items>
<item id="57" count="50000000" />
</items>
</achievement>
</achievements>
And i want my code:
try
{
final File inputFile = new File("C:/Users/ProjectX/Desktop/achievement.xml");
final DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
final Document doc = dBuilder.parse(inputFile);
for (Node node = doc.getFirstChild().getFirstChild(); node != null; node = node.getNextSibling())
{
if (node.getNodeName().equalsIgnoreCase("achievements"))
{
for (Node ach_node = node.getFirstChild().getFirstChild(); ach_node != null; ach_node = ach_node.getNextSibling())
{
if (ach_node.getNodeName().equalsIgnoreCase("achievement"))
{
}
}
}
}
}
catch (final Exception e)
{
e.printStackTrace();
}
to read until achievement and delete all sub-node (which include all condition and items or any other child node it might have). In addition i want to replace with my own node such as"
<dropList id="x">
<itemId="x" min="x" max="x" chance="x" />
</dropList>
Anyone know how to do this? Thanks in advance a lot to all community member who spend their time reading this.
<achievements>
<achievement id="1" name="Monster Slayer Lv1" description="Slay 15 monsters of any type" icon="Icon.skill0496" categoryId="1">
<conditions>
<condition name="Level" val="50" />
</conditions>
<dropList id="1">
<itemId="x" min="x" max="x" chance="x" />
</dropList>
<dropList id="2">
<itemId="x" min="x" max="x" chance="x" />
</dropList>
</achievement>
<achievement id="2" name="Monster Slayer Lv2" description="Slay 50 monsters of any type" icon="Icon.skill0497" categoryId="1">
<conditions>
<condition name="MonsterKill" val="50" />
</conditions>
<dropList id="1">
<itemId="x" min="x" max="x" chance="x" />
</dropList>
<dropList id="2">
<itemId="x" min="x" max="x" chance="x" />
</dropList>
</achievement>
</achievements>
It is better to use an XSLT transformation for your case. It has a specific pattern for exactly what you need. It is called Identity Transform.
All you need to do is just to run the XSLT below in your Java code.
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="items">
<dropList id="1">
<item Id="x" min="x" max="x" chance="x"/>
</dropList>
<dropList id="2">
<item Id="x" min="x" max="x" chance="x"/>
</dropList>
</xsl:template>
</xsl:stylesheet>
Output
<achievements>
<achievement id="1" name="Monster Slayer Lv1" description="Slay 15 monsters of any type" icon="Icon.skill0496" categoryId="1">
<conditions>
<condition name="Level" val="50"/>
</conditions>
<dropList id="1">
<item Id="x" min="x" max="x" chance="x"/>
</dropList>
<dropList id="2">
<item Id="x" min="x" max="x" chance="x"/>
</dropList>
</achievement>
<achievement id="2" name="Monster Slayer Lv2" description="Slay 50 monsters of any type" icon="Icon.skill0497" categoryId="1">
<conditions>
<condition name="MonsterKill" val="50"/>
</conditions>
<dropList id="1">
<item Id="x" min="x" max="x" chance="x"/>
</dropList>
<dropList id="2">
<item Id="x" min="x" max="x" chance="x"/>
</dropList>
</achievement>
</achievements>

XSLT1 how to reliably get data out of parent with variable amount of children

I have this really big XML file contains really old animal testing data, each row contains a set of fields, but each field is also again filled with multiple fields. The file uses the sequence of the children to determine the relationship between the actual data.
I need to extract these fields in sequence of 'first all first data fields', then all the second ones, then thirds... ect. but the quantity of data fields is not set in stone for different rows, only in the same row it seems to be consistent.
Its hard to explain, but i added an example document, the first table is the source, the second one is where i want to get to.
I tried something like below to save the node relations but i couldnt get it to work. Id say im only barely past the beginner level of xslt, but due to the current infrastructure requirements i need to get this working in XSLT1;
<xsl:template match="ROW">
<xsl:for-each select="./anamnese/DATA">
<xsl:variable name="depth">
<xsl:number/>
</xsl:variable>
<xsl:value-of select=".//anamnese/DATA[$depth]"/>
<xsl:value-of select=".//diagnose/DATA[$depth]"/>
<xsl:value-of select=".//fichenr./DATA[$depth]"/>
<xsl:value-of select=".//vis/DATA[$depth]"/>
<xsl:value-of select=".//dr._A/DATA[$depth]"/>
</xsl:for-each>
</xsl:template>
An example of the starting table with bogus data. Note how the DATA fields are not reliable in the amount present.
<TABLE>
<ROW MODID="4" RECORDID="1801">
<anamnese>
<DATA>Gevonden in lat decubitus. Dag van huis weggeweest. Vanmorgen nog goed gegeten.</DATA>
</anamnese>
<diagnose>
<DATA/>
</diagnose>
<fichenr.>
<DATA>3607</DATA>
</fichenr.>
<vis>
<DATA>25/08/2017</DATA>
</vis>
<dr._A>
<DATA>EL</DATA>
</dr._A>
</ROW>
<ROW MODID="6" RECORDID="1802">
<anamnese>
<DATA>zeer agressief geworden op korte tijd</DATA>
<DATA/>
<DATA>detartratie nodig. Eerst cardiologisch onderzoek gehad bij Valerie Bavegems. Verslag volgt nog. Drinkt redelijk veel, 500 g afgevallen</DATA>
</anamnese>
<diagnose>
<DATA> euthanasie</DATA>
<DATA/>
<DATA/>
</diagnose>
<fichenr.>
<DATA>3989</DATA>
<DATA>3688</DATA>
<DATA>3608</DATA>
</fichenr.>
<vis>
<DATA>2/11/2017</DATA>
<DATA>6/09/2017</DATA>
<DATA>26/08/2017</DATA>
</vis>
<dr._A>
<DATA>EL</DATA>
<DATA>EL</DATA>
<DATA>MA</DATA>
</dr._A>
</ROW>
<ROW MODID="4" RECORDID="1803">
<anamnese/>
<diagnose/>
<fichenr./>
<vis/>
<dr._A/>
</ROW>
</TABLE>
The desired end product;
<TABLE_B>
<ROW>
<recordId>1801</recordId>
<anamnese>Gevonden in lat decubitus. Vanmorgen nog goed gegeten.</anamnese>
<diagnose></diagnose>
<fichenr.>3607</fichenr.>
<vis>25/08/2017</vis>
<dr._A>EL</dr._A>
</ROW>
<ROW>
<recordId>1802</recordId>
<anamnese>zeer agressief geworden op korte tijd</anamnese>
<diagnose> euthanasie</diagnose>
<fichenr.>3989</fichenr.>
<vis>2/11/2017</vis>
<dr._A></dr._A>
</ROW>
<ROW>
<recordId>1802</recordId>
<anamnese/>
<diagnose></diagnose>
<fichenr.>3688</fichenr.>
<vis>6/09/2017</vis>
<dr._A>EL</dr._A>
</ROW>
<ROW>
<recordId>1802</recordId>
<anamnese>detartratie nodig. Eerst cardiologisch onderzoek gehad bij Valerie Bavegems. Verslag volgt nog. Drinkt redelijk veel, 500 g afgevallen</anamnese>
<diagnose/>
<fichenr.>3608</fichenr.>
<vis>26/08/2017</vis>
<dr._A>MA</dr._A>
</ROW>
<ROW>
<recordId>1803</recordId>
<anamnese/>
<diagnose/>
<fichenr./>
<vis/>
<dr._A/>
</ROW>
</TABLE_B>
How can i reliably extract all the DATA fields with the correct relation intact even if i cannot possibly predict the MAX amount of DATA fields possible in a row? (i've visually seen a row that had 266 :p)
I have assumed that the node anamnese dictates the number of rows to be produced. If it has 3 children, it will produce 3 rows. If it is empty, then no information should follow. I just copied the nodes.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes"/>
<xsl:template match="TABLE">
<TABLE_B>
<xsl:apply-templates/>
</TABLE_B>
</xsl:template>
<xsl:template match="ROW">
<xsl:variable name="ID" select="#RECORDID"/>
<xsl:choose>
<!-- Test for anamnese children.
If there is no child element,
copy the ROW child elements
-->
<xsl:when test="not(anamnese/*)">
<ROW>
<recordId>
<xsl:value-of select="$ID"/>
</recordId>
<xsl:copy-of select="*"/>
</ROW>
</xsl:when>
<xsl:otherwise>
<!-- if there exists anamnese children,
loop through these.-->
<xsl:for-each select="anamnese/*">
<xsl:variable name="pos" select="position()"/>
<ROW>
<recordId>
<xsl:value-of select="$ID"/>
</recordId>
<anamnese>
<xsl:value-of select="."/>
</anamnese>
<!-- loop through the following siblings of anamnese -->
<xsl:for-each select="../following-sibling::*">
<xsl:element name="{local-name()}">
<!-- select nodes with the same position -->
<xsl:value-of select="*[position()=$pos]"/>
</xsl:element>
</xsl:for-each>
</ROW>
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
If ROW block structure is static, you can firstly define ROW block structure in separate template with parameters, and then pass its parameters in required loops, see below XSL:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!--prepare template for ROW block structure-->
<xsl:template name="row">
<!--pass recordId value-->
<xsl:param name="rec.id"/>
<!--pass ROW position number value-->
<xsl:param name="row.id"/>
<!--pass DATA position number value-->
<xsl:param name="data.id"/>
<!--creating ROW structure-->
<ROW>
<recordId>
<xsl:value-of select="$rec.id"/>
</recordId>
<anamnese>
<xsl:value-of select="//ROW[$row.id]/anamnese/DATA[$data.id]"/>
</anamnese>
<diagnose>
<xsl:value-of select="//ROW[$row.id]/diagnose/DATA[$data.id]"/>
</diagnose>
<fichenr.>
<xsl:value-of select="//ROW[$row.id]/fichenr./DATA[$data.id]"/>
</fichenr.>
<vis>
<xsl:value-of select="//ROW[$row.id]/vis/DATA[$data.id]"/>
</vis>
<dr._A>
<xsl:value-of select="//ROW[$row.id]/dr._A/DATA[$data.id]"/>
</dr._A>
</ROW>
</xsl:template>
<xsl:template match="/">
<!--desired name of root node-->
<TABLE_B>
<xsl:for-each select="//anamnese">
<!--get recordId number-->
<xsl:variable name="var.rec.id" select="../#RECORDID"/>
<!--get row position number-->
<xsl:variable name="var.row.id" select="position()"/>
<xsl:choose>
<!--check if DATA block exists-->
<xsl:when test="DATA">
<xsl:for-each select="DATA">
<xsl:call-template name="row">
<xsl:with-param name="rec.id" select="$var.rec.id"/>
<xsl:with-param name="row.id" select="$var.row.id"/>
<xsl:with-param name="data.id" select="position()"/>
</xsl:call-template>
</xsl:for-each>
</xsl:when>
<!--proceed if DATA block does not exist-->
<xsl:otherwise>
<xsl:call-template name="row">
<xsl:with-param name="rec.id" select="$var.rec.id"/>
<xsl:with-param name="row.id" select="$var.row.id"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</TABLE_B>
</xsl:template>
</xsl:stylesheet>
Then try it with your input XML and result will be as desired:
<?xml version="1.0" encoding="UTF-8"?>
<TABLE_B>
<ROW>
<recordId>1801</recordId>
<anamnese>Gevonden in lat decubitus. Dag van huis weggeweest. Vanmorgen nog goed gegeten.</anamnese>
<diagnose/>
<fichenr.>3607</fichenr.>
<vis>25/08/2017</vis>
<dr._A>EL</dr._A>
</ROW>
<ROW>
<recordId>1802</recordId>
<anamnese>zeer agressief geworden op korte tijd</anamnese>
<diagnose> euthanasie</diagnose>
<fichenr.>3989</fichenr.>
<vis>2/11/2017</vis>
<dr._A>EL</dr._A>
</ROW>
<ROW>
<recordId>1802</recordId>
<anamnese/>
<diagnose/>
<fichenr.>3688</fichenr.>
<vis>6/09/2017</vis>
<dr._A>EL</dr._A>
</ROW>
<ROW>
<recordId>1802</recordId>
<anamnese>detartratie nodig. Eerst cardiologisch onderzoek gehad bij Valerie Bavegems. Verslag volgt nog. Drinkt redelijk veel, 500 g afgevallen</anamnese>
<diagnose/>
<fichenr.>3608</fichenr.>
<vis>26/08/2017</vis>
<dr._A>MA</dr._A>
</ROW>
<ROW>
<recordId>1803</recordId>
<anamnese/>
<diagnose/>
<fichenr./>
<vis/>
<dr._A/>
</ROW>
</TABLE_B>
This may help you.
<xsl:template match="TABLE">
<xsl:element name="TABLE_B">
<xsl:apply-templates select="ROW"/>
</xsl:element>
</xsl:template>
<xsl:template match="ROW">
<xsl:apply-templates select="anamnese">
<xsl:with-param name="RECORDID" select="#RECORDID"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="anamnese">
<xsl:param name="RECORDID"/>
<xsl:apply-templates select="DATA">
<xsl:with-param name="RECORDID" select="$RECORDID"/>
</xsl:apply-templates>
<xsl:if test="not(DATA)">
<xsl:element name="ROW">
<xsl:element name="recordId">
<xsl:value-of select="$RECORDID"/>
</xsl:element>
<xsl:element name="anamnese">
<xsl:value-of select="text()"/>
</xsl:element>
<xsl:apply-templates select="following-sibling::*"/>
</xsl:element>
</xsl:if>
</xsl:template>
<xsl:template match="DATA">
<xsl:param name="RECORDID"/>
<xsl:variable name="position" select="position()"/>
<xsl:element name="ROW">
<xsl:element name="recordId">
<xsl:value-of select="$RECORDID"/>
</xsl:element>
<xsl:element name="anamnese">
<xsl:value-of select="text()"/>
</xsl:element>
<xsl:element name="diagnose">
<xsl:value-of select="../../diagnose/DATA[$position]"/>
</xsl:element>
<xsl:element name="fichenr.">
<xsl:value-of select="../../fichenr./DATA[$position]"/>
</xsl:element>
<xsl:element name="vis">
<xsl:value-of select="../../vis/DATA[$position]"/>
</xsl:element>
<xsl:element name="dr._A">
<xsl:value-of select="../../dr._A/DATA[$position]"/>
</xsl:element>
</xsl:element>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>

Unmarshalling without unique node names

I am stuck trying to figure out how to unmarshall an XML file supplied by IBM Cognos.
The structure does not provide unique names for the different child nodes under the element but there is a block of metadata that defines the order of the values.
This is a simplified sample of the XML file.
<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
xmlns="http://developer.cognos.com/schemas/xmldata/1/"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://developer.cognos.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
<metadata>
<item name="EmployeeID" type="xs:string" length="20"/>
<item name="firstName" type="xs:string" length="50"/>
<item name="lastName" type="xs:string" length="50"/>
</metadata>
<data>
<row>
<value>EMP1</value>
<value>Joe</value>
<value>Blogs</value>
</row>
<row>
<value>EMP2</value>
<value>Mary</value>
<value>Soap</value>
</row>
</data>
</dataset>
I'm using Spring OXM and Castor for this project and I have no control over the XML format as I am pulling it via a web service from a third party system.
Update : I'm not adverse to swapping out Castor for a different marshalling/unmarshalling library.
The magic of XSLT to the rescue. By running the provided XML through the following XSLT stylesheet I was able to create an XML file that I could then unmarshall correctly.
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:cognos="http://developer.cognos.com/schemas/xmldata/1/">
<xsl:output method="xml" version="1.0" encoding="UTF-8" standalone="yes" indent="yes"/>
<xsl:template match="/">
<xsl:element name="DataSet">
<xsl:for-each select="//*[name()='row']">
<xsl:variable name="row" select="position()" />
<xsl:element name="Row">
<xsl:for-each select="//*[name()='item']">
<xsl:variable name="elementName" select="#name" />
<xsl:variable name="index" select="position()" />
<xsl:element name="{translate($elementName,' ','_')}">
<xsl:value-of select="//cognos:row[$row]/cognos:value[$index]" />
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
This transformed the XML file as follows
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DataSet>
<Row>
<EmployeeID>EMP1</EmployeeID>
<firstName>Joe</firstname>
<lastName>Blogs</lastName>
</Row>
<Row>
<EmployeeID>EMP2</EmployeeID>
<firstName>Mary</firstname>
<lastName>Soap</lastName>
</Row>
</DataSet>

XSLT skip duplicate element

I am a beginner to XSLT.
My Source XML is as below:
<Passengers>
<Passenger type="A" id="P1"/>
<Passenger type="A" id="P2"/>
<Passenger type="B" id="P3"/>
<Passenger type="C" id="P4"/>
</Passengers>
The out-put should be as below:
<Pax_Items>
<Item>
<Type>A</Type>
<Count>2</Count>
</Item>
<Item>
<Type>B</Type>
<Count>1</Count>
</Item>
<Item>
<Type>C</Type>
<Count>1</Count>
</Item>
</Pax_Items>
I have created XSLT as below
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" exclude-result-prefixes="xmlns">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes" />
<xsl:variable name="filter" select="'TK,AJ'"/>
<xsl:template match="Passengers">
<xsl:element name="Pax_Items">
<xsl:apply-templates select="Passenger"/>
</xsl:element>
</xsl:template>
<xsl:template match="Passenger">
<xsl:element name="Item">
<xsl:element name="Type">
<xsl:value-of select="#type"/>
</xsl:element>
<xsl:element name="Count">
<xsl:value-of select="count(//Passenger[#type=current()/#type])"/>
</xsl:element>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
With above XSLT i got the below output:
<Pax_Items>
<Item>
<Type>A</Type>
<Count>2</Count>
</Item>
<Item>
<Type>A</Type>
<Count>2</Count>
</Item>
<Item>
<Type>B</Type>
<Count>1</Count>
</Item>
<Item>
<Type>C</Type>
<Count>1</Count>
</Item>
</Pax_Items>
How can i omit or skip the duplicate element? Please help.
This is actually a good example of a grouping problem. In XSLT1.0, the most efficient way to do grouping is with a technique called "Muenchian Grouping", so it might be worthwhile learning about this.
In this case, you want to group Passenger elements by their #type attribute, so you would define a key to do this
<xsl:key name="Passengers" match="Passenger" use="#type"/>
Then, you need to select the Passenger elements which happen to be the first occurence of that element in the group for their #type attribute. This is done as follows:
<xsl:apply-templates
select="Passenger[generate-id() = generate-id(key('Passengers', #type)[1])]"/>
Note the use of generate-id which generates a unique ID for a node, allowing two nodes to be compared.
Then, to count the number of occurences in the group, it is straight-forward
<xsl:value-of select="count(key('Passengers', #type))"/>
Here is the full XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:key name="Passengers" match="Passenger" use="#type"/>
<xsl:template match="Passengers">
<Pax_Items>
<xsl:apply-templates select="Passenger[generate-id() = generate-id(key('Passengers', #type)[1])]"/>
</Pax_Items>
</xsl:template>
<xsl:template match="Passenger">
<Item>
<Type>
<xsl:value-of select="#type"/>
</Type>
<Count>
<xsl:value-of select="count(key('Passengers', #type))"/>
</Count>
</Item>
</xsl:template>
</xsl:stylesheet>
When applied to your sample XML, the following is output
<Pax_Items>
<Item>
<Type>A</Type>
<Count>2</Count>
</Item>
<Item>
<Type>B</Type>
<Count>1</Count>
</Item>
<Item>
<Type>C</Type>
<Count>1</Count>
</Item>
</Pax_Items>
Also note there is no real reason to use xsl:element to output static elements. Just write out the element directly.
Update your passenger template as follows; I have added if condition to check duplicate nodes,
<xsl:template match="Passenger">
<xsl:if test="not(preceding-sibling::Passenger[#type = current()/#type])">
<xsl:element name="Item">
<xsl:element name="Type">
<xsl:value-of select="#type"/>
</xsl:element>
<xsl:element name="Count">
<xsl:value-of select="count(//Passenger[#type=current()/#type])"/>
</xsl:element>
</xsl:element>
</xsl:if>
</xsl:template>

Categories

Resources