How can we convert XML file to CSV? - java

I am having an XML file
<?xml version="1.0" encoding="ISO-8859-1"?>
<Results>
<Row>
<COL1></COL1>
<COL2>25.00</COL2>
<COL3>2009-07-06 15:49:34.984</COL3>
<COL4>00001720</COL4>
</Row>
<Row>
<COL1>RJ</COL1>
<COL2>26.00</COL2>
<COL3>2009-07-06 16:04:16.156</COL3>
<COL4>00001729</COL4>
</Row>
<Row>
<COL1>SD</COL1>
<COL2>28.00</COL2>
<COL3>2009-07-06 16:05:04.375</COL3>
<COL4>00001721</COL4>
</Row>
</Results>
I have to convert this XML into CSV file. I have heard we can do such thing using XSLT. How can i do this in Java ( with/without XSLT )?

Using XSLT is often a bad idea. Use Apache Commons Digester. It's fairly easy to use - here's a rough idea::
Digester digester = new Digester();
digester.addObjectCreate("Results/Row", MyRowHolder.class);
digester.addCallMethod("Results/Row/COL1","addCol", 0);
// Similarly for COL2, etc.
digester.parse("mydata.xml");
This will create a MyRowHolder instance (where this is a class you provide). This class would have a addCol() method which would be called for each <COLn> with the contents of that tag.

In pseudo code:
loop through the rows:
loop through all children of `Row`:
write out the text
append a comma
new line
That quick little loop will write a comma at the end of each line, but I'm sure you can figure out how to remove that.
For actually parsing the XML, I suggest using JDOM. It has a pretty intuitive API.

In XSLT 1.0:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="ISO-8859-1" />
<xsl:template match="/Results">
<xsl:apply-templates select="Row" />
</xsl:template>
<xsl:template match="Row">
<xsl:apply-templates select="*" />
<xsl:if test="not(last())">
<xsl:value-of select="'
'" />
</xsl:if>
</xsl:template>
<xsl:template match="Row/*">
<xsl:value-of select="." />
<xsl:if test="not(last())">
<xsl:value-of select="','" />
</xsl:if>
</xsl:template>
</xsl:stylesheet>
If your COL* values can contain commas, you could wrap the values in double quotes:
<xsl:template match="Row/*">
<xsl:value-of select="concat('"', ., '"')" />
<!-- ... --->
If they can contain commas and double quotes, things could get a bit more complex due to the required escaping. You know your data, you'll be able to decide how to best format the output. Using a different separator (e.g. TAB or a pipe symbol) is also an option.

Read the XML file in.
Loop throught each record and add it to a csv file.

With XSLT you can use the JAXP interface to the XSLT processor and then use <xsl:text> in your stylesheet to convert to text output.
<xsl:text>
</xsl:text>
generates a newline. for example.

Use the straightforward SAX API via the standard Java JAXP package. This will allow you to write a class that receives events for each XML element your reader encounters.
Briefly:
read your XML in using SAX
record text values via the SAX DefaultHandler characters() method
when you get an end event for a COL, record this string value
when you get the ROW end event, simply write out a comma separated line of previously recorded values

Related

How to transform an xml file by searching for some nodes and replacing the values

This is the input xml -
<payload id="001">
<termsheet>
<format>PDF</format>
<city>New York</city>
</termsheet>
</payload>
We are using Xalan for most of our xml transformations and we are on XSLT 1.0
I want to write a XSLT template which would convert the input to the below output -
<payload id="001">
<termsheet>
<format>pdf</format>
<city>Mr. ABC</city>
</termsheet>
</payload>
I tried lot of answers on SO, but can't get around this problem.
Apologies for not being clear, toLower was an over simplification. I want to use the city name and invoke a java method which will return a business contact from that city. I have updated the original question
I think that the simplest way is to use java extension with Xalan, you can write a simple java class that implements the business logic you need, and then call it from your xslt. The stylesheet is quite simple
<xsl:stylesheet version="1.0"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="java">
<xsl:template match='node() | #*'>
<xsl:copy>
<xsl:apply-templates select ='node()|#*'></xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="termsheet/city">
<xsl:copy>
<xsl:value-of select='java:org.example.Card.getName(.)'/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
you also neeed to write the java class invoked
package org.example
public class Card {
public static String getName(String id) {
// put here your code to get what you need
return "Mr. ABC"
}
}
there are other ways to do that and you should really give an eye to the documentation about xalan extensions

How to handle duplicate node names when converting xml to csv using java and xsl

I am given an xml file from an outside source (so I have no control over the attribute names) and unfortunately they use the same name for a paired set of data. I can't seem to figure out how to access the second value. An example of the data in the xml file is:
<?xml version="1.0"?>
<addressResponse>
<results>
<ownerName>Name1</ownerName>
<houseAddress>House1</houseAddress>
<houseAddress>CityState1</houseAddress>
<yearBuilt>Year1</yearBuilt>
</results>
<results>
<ownerName>Name2</ownerName>
<houseAddress>House2</houseAddress>
<houseAddress>CityState2</houseAddress>
<yearBuilt>Year2</yearBuilt>
</results>
</addressResponse>
I already have my java code together and can parse the xml but I need help handling the duplicate attribute name. I want my csv file to look like the following:
owner,address,citystate,yearbuilt
Name1,House1,CityState1,Year1
Name2,House2,CityState2,Year2
In my xsl file, I did the following "hoping" it would get the second houseAddress but it didn't:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">owner,address,citystate,yearbuilt
<xsl:for-each select="//results>
<xsl:value-of select="concat(ownerName,',',houseAddress,',',houseAddress,',',yearBuilt,'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
That gave me:
owner,address,citystate,yearbuilt
Name1,House1,House1,Year1
Name2,House2,House2,Year2
Is there a trick to do this? I can't get the attribute names changed from the originator so I'm stuck with them. Thank you in advance.
Use:
houseAddress[2]
to get the value of the second occurrence of the houseAddress element.
Note that we are assuming XSLT 1.0 here.

Formatting decimal values for XML

I have a problem currently where a system we are connecting to expects to receive XML which contains, among other things, three double fields formatted to one decimal place. Personally I feel that our system should just be able to send values in default format and then it's up to other systems to format their own representation as they please, but alas this doesn't seem to be an option.
My Java-based system is currently converting objects to XML through the use of XStream. We have an XSD which accompanies the XML and defines the various elements as string, double, dateTime, etc.
I have three double fields which hold values like 12.5, 100.123, 5.23445 etc. Right now they are converted pretty much as-is into the XML. What I need is these values to be formatted in the XML to one decimal place; 12.5, 100.1, 5.2, etc.
I have briefly thought up options to accomplish this:
Somehow have Java format these values to this precision before it goes to the XML. Perhaps NumberFormat can do this, although I thought that was mainly for using with String output.
Hope that the XSD can do this for me; I know you can place limits on precision in the XSD, but I am unsure whether it actually handles the rounding itself or will just say 'this value of 123.123 is invalid for this schema'?
Use XSLT to somehow accomplish this for me.
I'd to pick your collective brains as to what would be the 'accepted' way / best practice to use in a situation like this.
Thanks,
Dave.
XStream has converters (tutorial). You would have to register your own Double converter that will handle this. In the converter use DecimalFormat to limit the number of decimal places.
This can be done in a single XPath expression.
Use:
floor(.) + round(10*(. -floor(.))) div 10
Verification using XSLT as a host of XPath:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[contains(.,'.')]">
<xsl:value-of select=
"floor(.) + round(10*(. -floor(.))) div 10"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the following XML document:
<t>
<n>12.5</n>
<n>100.123</n>
<n>5.26445</n>
</t>
the wanted, correct result is produced:
<t>
<n>12.5</n>
<n>100.1</n>
<n>5.3</n>
</t>
Explanation: Use of the standard XPath functions floor(), round() and the XPath operator div and your logic.
Generalized expression:
floor(.) + round($vFactor*(. -floor(.))) div $vFactor
where $vFactor is 10^N, where N is the number of digits after the decimal point we want.
Using this expression, the modified XSLT transformation is this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pPrecision" select="4"/>
<xsl:variable name="vFactor" select=
"substring('10000000000000000000000',
1, $pPrecision+1
)
"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()[contains(.,'.')]">
<xsl:value-of select=
"floor(.) + round($vFactor*(. -floor(.))) div $vFactor"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the same XML document (above), we produce the wanted output for any meaningful value of $pPrecision. In the above example it is set to 4 and the result contains all numbers rounded up to four digits after the decimal point:
<t>
<n>12.5</n>
<n>100.123</n>
<n>5.2645</n>
</t>

SXXP0003: Error reported by XML parser: Content is not allowed in prolog

My XML file is
<?xml version="1.0" encoding="ISO-8859-1"?>
<T0020
xsi:schemaLocation="http://www.safersys.org/namespaces/T0020V1 T0020V1.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.safersys.org/namespaces/T0020V1">
<INTERFACE>
<NAME>SAFER</NAME>
<VERSION>04.02</VERSION>
</INTERFACE>
<TRANSACTION>
<VERSION>01.00</VERSION>
<OPERATION>REPLACE</OPERATION>
<DATE_TIME>2009-09-01T00:00:00</DATE_TIME>
<TZ>CT</TZ>
</TRANSACTION>
<IRP_ACCOUNT>
<IRP_CARRIER_ID_NUMBER>564182</IRP_CARRIER_ID_NUMBER>
<IRP_BASE_COUNTRY>US</IRP_BASE_COUNTRY>
<IRP_BASE_STATE>AR</IRP_BASE_STATE>
<IRP_ACCOUNT_NUMBER>67432</IRP_ACCOUNT_NUMBER>
<IRP_ACCOUNT_TYPE>I</IRP_ACCOUNT_TYPE>
<IRP_STATUS_CODE>100</IRP_STATUS_CODE>
<IRP_STATUS_DATE>2008-02-01</IRP_STATUS_DATE>
<IRP_UPDATE_DATE>2009-06-18</IRP_UPDATE_DATE>
<IRP_NAME>
<NAME_TYPE>LG</NAME_TYPE>
<NAME>LARRY SHADDON</NAME>
<IRP_ADDRESS>
<ADDRESS_TYPE>PH</ADDRESS_TYPE>
<STREET_LINE_1>10291 HWY 124</STREET_LINE_1>
<STREET_LINE_2/>
<CITY>RUSSELLVILLE</CITY>
<STATE>AR</STATE>
<ZIP_CODE>72802</ZIP_CODE>
<COUNTY>POPE</COUNTY>
<COLONIA/>
<COUNTRY>US</COUNTRY>
</IRP_ADDRESS>
<IRP_ADDRESS>
<ADDRESS_TYPE>MA</ADDRESS_TYPE>
<STREET_LINE_1>10291 HWY124</STREET_LINE_1>
<STREET_LINE_2/>
<CITY>RUSSELLVILLE</CITY>
<STATE>AR</STATE>
<ZIP_CODE>72802</ZIP_CODE>
<COUNTY>POPE</COUNTY>
<COLONIA/>
<COUNTRY>US</COUNTRY>
</IRP_ADDRESS>
</IRP_NAME>
</IRP_ACCOUNT>
</T0020>
I am using following XSLT to split my xml file to multiple xml file .
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:t="http://www.safersys.org/namespaces/T0020V1" version="2.0">
<xsl:output method="xml" indent="yes" name="xml" />
<xsl:variable name="accounts" select="t:T0020/t:IRP_ACCOUNT" />
<xsl:variable name="size" select="30" />
<xsl:template match="/">
<xsl:for-each select="$accounts[position() mod $size = 1]">
<xsl:variable name="filename" select="resolve-uri(concat('output/',position(),'.xml'))" />
<xsl:result-document href="{$filename}" method="xml">
<T0020>
<xsl:for-each select=". | following-sibling::t:IRP_ACCOUNT[position() < $size]">
<xsl:copy-of select="." />
</xsl:for-each>
</T0020>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
It works well in Sample Java Apllication,but when i tried to use same in my Spring based application then it gives following error .
Error on line 1 column 1 of T0020:
SXXP0003: Error reported by XML parser: Content is not allowed in prolog.
I don't know what goes wrong ? Please help me. Thanks In Advance.
Your XML starts with a byte-order mark in UTF-8 (0xEF,0xBB,0xBF), which isn't visible. Try opening your file with a hex editor and have a look.
Many text editors under Windows like to insert this at the start of UTF-8 encoded text, despite the fact that UTF-8 doesn't actually need a byte order mark since the ordering of bytes in UTF-8 is already well defined.
Java's XML parsers will all choke on a BOM with exactly the error message you are seeing. You'll need to either strip out the BOM, or write a wrapper for your InputStream that you're handing the XML parser to do this for you at parsing time.
There is some content in the document before the XML data starts, probably whitespace at a guess (that's where I've seen this before).
The prolog is the part of the document that is before the opening tag, with tag-like constructs like <? and <!. You may have some characters/whitespace in between these tags too. Prologs and valid content are explained on tiztag.com.
Maybe post up an depersonalised example of your XML data?
It's also possible to get this if you attempt to process the content twice. (Which is fairly easy to do in spring.) In which case, there'd be nothing wrong with your XML. This scenario seems likely since the sample application works, but introducing spring causes problems.
In my case the encoding="UTF-16" was causing this issue. It got resolved when I changed it to UTF-8.

How to edit XML with XSL?

I'm writing a dummy "MyAgenda" application in Java which has to allow maintenance of the XML file that stores the data.
Say I have a XML file like:
<myagenda>
<contact>
<name>Matthew Blake</name>
<phone>12345678</phone>
</contact>
</myagenda>
How can I add a new <contact> by using XSLT ?
Thanks.
Start with the identity transform, which transforms any XML document into itself.
The identity transform is a simple machine: given a tree, it copies every node it finds recursively. You're going to override its behavior for one specific node - the myagenda element - which it's going to copy in a different way.
To do this, add a template that matches the element that you want to update and duplicates it. In your case:
<xsl:template match="myagenda">
<xsl:copy-of select=".">
<xsl:apply-templates select="node() | #*"/>
</xsl:copy-of>
</xsl:template>
You might think, "wait isn't that the identity transform?" It is, but it's not going to stay that way.
Now decide on how you're going to get the new contact information into the transform. There are basically two ways: read it from a separate XML document using the document function, or pass the values into the transform using parameters. Let's assume that you're using parameters; in this case, you'd add the following to the top of your XSLT (right after the xsl:output element):
<xsl:param name="contactName"/>
<xsl:param name="contactPhone"/>
Now, instead of transforming myagenda into a copy of itself, you want to transform it into a copy of itself that has a new contact in it. So modify the template to do this:
<xsl:template match="myagenda">
<xsl:copy-of select=".">
<xsl:apply-templates select="node() | #*"/>
<contact>
<name><xsl:value-of select="$contactName"/></name>
<phone><xsl:value-of select="$contactPhone"/></phone>
</contact>
</xsl:copy-of>
</xsl:template>
If you wanted to get the name and phone out of a separate XML document in the file system, you'd start the XSLT with something like this:
<xsl:variable name="contact" value="document('contact.xml')"/>
<xsl:variable name="contactName" value="$contact/*/name[1]'/>
<xsl:variable name="contactPhone" value=$contact/*/phone[1]'>
That reads in contact.xml and finds the first name and phone element under the top-level element (using * in the pattern means that you don't care what the top-level element's name is).
use the xsl:param as a global parameter in the header of your xsl stylesheet.
<xsl:param name="newname"/>
<xsl:param name="newphone"/>
fill the new params with your xslt engine and then add the new item via a template:
(...)
<xsl:template match="myagenda">
<xsl:apply-templates select="contact"/>
<xsl:if test="string-length($newname)>0">
<xsl:element name="contact">
<xsl:element name="name">
<xsl:value-of select="$newname"/>
</xsl:element>
<xsl:element name="phone">
<xsl:value-of select="$newphone"/>
</xsl:element>
</xsl:element>
</xsl:if>
</xsl:template>
(...)
XSLT converts 1 xml file to another xml or text file.

Categories

Resources