I have an XML file containing nested records. I have to get records from the file and write to an Excel file. Right now, the file I have produced is a flat file. (I used Apache POI to write to the Excel file). I need it to maintain the hierarchical information, such that the nested records are indented.
My XML file looks like:
<node>
<id>123</id>
<label>ABC</label>
<node>
<id>456</id>
<label>DEF</label>
....... so on
My current Excel looks like :
I need something like (representing the hierarchy in XML file):
Does anyone have any experience with something like that? I would really appreciate the help.
If you are willing to test out Sax Parser I may have a solution you could try. Below is the class I have used, it contains a Default Handler for the SAXParser as well as the code I used to format the xml file before inputting it to the xlsx file. It looks a tad hefty and so I have tried to add comments wherever possible to try and make it understandable.
public class SO2 {
private SO2(File xml){
wb = new XSSFWorkbook(); //Workbook to create
sheet = wb.createSheet(); //Sheet to write to
try {
SO2.retrieveSaxParser().parse(xml, SO2.retrieveHandler()); //Begin parse
Path file = Paths.get(System.getProperty("user.home"), "Desktop", "XMLTest.xlsx"); //Where to write file
wb.write(new FileOutputStream(file.toString()));
} catch (SAXException | IOException | ParserConfigurationException e) {
JOptionPane.showMessageDialog(null, e.getMessage());
}
System.exit(0);
}
protected static void instertUpdate(String data, int columnNum, int rowNum) {//Method to add to spreadsheet
/*The below writes to the file, the row if statements are there to stop the method
* overwriting any rows already created
*/
if(row != null){
if(row.getRowNum() != rowNum){
row = sheet.createRow(rowNum);
}
} else {
row = sheet.createRow(rowNum);
}
cell = row.createCell(columnNum);//Make our cell
cell.setCellValue(data);//Write to it
}
private static SAXParser retrieveSaxParser() throws ParserConfigurationException, SAXException{
return SAXParserFactory.newInstance().newSAXParser();//Get parser
}
private static DefaultHandler retrieveHandler() {
DefaultHandler handler = new DefaultHandler(){//Handler with methods required for parsing xml
#Override
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
if(startWasPrevious == true){//Start indenting after the first element tag processed
indent++;
}
rowNumber++;//Move row down for each tag
columnNumber = indent; //Cell number set to current indent level
SO2.instertUpdate("<" + qName + ">", columnNumber, rowNumber);//Insert
startWasPrevious = true; //For formatting
previous = startTag;//For formatting
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(startWasPrevious == false){//For removal of indentation
indent--;
}
if(!previous.equals("text")){//If text wasn't last part parsed then set column to indent
columnNumber = indent;
} else{//If text was processed last move cell across
columnNumber++;
}
if(previous.equals("end")){//Move to a newline if last parsed element was an ending tag
rowNumber++;
}
if(startWasPrevious == false){ //If there was no text previously
SO2.instertUpdate("</" + qName + ">", columnNumber, rowNumber);
} else { //If there was text then this will be enclosing end tag
SO2.instertUpdate("</" + qName + ">", columnNumber, rowNumber);
}
startWasPrevious = false; //For formatting
previous = endTag;
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
String s = new String(ch, start, length).trim();//Get text
if(s.length() > 0){
columnNumber++; //Move column number along
SO2.instertUpdate(s, columnNumber, rowNumber);
previous = text;
}
}
};
return handler;
}
//Main
public static void main(String[] args) {
JFrame file = new JFrame("File choice. . .");
file.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
FileDialog dialog = new FileDialog(file, "Choose a file", FileDialog.LOAD);//Get XML File
dialog.setDirectory(Paths.get(System.getProperty("user.home")).resolve("Desktop").toString());
dialog.setFile("*.xml");
dialog.setVisible(true);
if(dialog.getFile() == null){
System.exit(0);
} else {
xmlFile = new File(dialog.getDirectory() + dialog.getFile());
javax.swing.SwingUtilities.invokeLater(new Runnable() {
#Override
public void run() {
new SO2(xmlFile);
}
});
}
}
private static File xmlFile;
private static XSSFWorkbook wb;
private static Sheet sheet;
private static Row row;
private static Cell cell;
private static boolean startWasPrevious = false; //For formatting purposes
private static int rowNumber = -1; //Hold row number
private static int columnNumber = 0;//Hold number of cell to wtite to
private static int indent;//For indenting
private static String previous = "";//To know what was last processed
private static final String endTag = "end";//Values for previous to hold
private static final String text = "text";//Values for previous to hold
private static final String startTag = "start";//Values for previous to hold
}
This was the xml file I used:
<?xml version="1.0"?>
<empire>
<darkness>
<sith>
<title>Darth</title>
<name>Vader</name>
<power>Grip</power>
</sith>
<sith>
<title>Darth</title>
<name>Sidious</name>
<power>Lightning</power>
</sith>
</darkness>
Hope it helps,
good luck
Related
I have a huge excel file which I am trying to parse using SAX parser in JAVA. I am mostly making use of Apache POI library and working with .XLSX files. Here is how xml contents looks inside zipped excel folder at /xl/worksheets/sheet1.xml which i am trying to read:
<row r="1">
<c r="A1" t="inlineStr"><is><t>my value 1</t></is></c>
<c r="B1" t="inlineStr"><is><t>my value 2</t></is></c>
<c r="C1" t="inlineStr"><is><t>my value 3</t></is></c>
</row>
This one particular excel file is making use of inline string values as shown above.
This my function which executes the program to read the file:
public void executeExcelDataExtraction() throws IOException, OpenXML4JException, SAXException, ParserConfigurationException, XMLStreamException, FactoryConfigurationError {
OPCPackage pkg = OPCPackage.open(XLSX_INPUT_FILE.xlsx);
XSSFReader r = new XSSFReader( pkg );
SharedStringsTable sst = r.getSharedStringsTable();
ImportArticleDataProcessorExcelFileReaderFactory handlerFactory = new
ImportArticleDataProcessorExcelFileReaderFactory(sst);
XMLReader parser = fetchSheetParser(handlerFactory);
Iterator<InputStream> sheets = r.getSheetsData();
if (sheets instanceof XSSFReader.SheetIterator) {
XSSFReader.SheetIterator sheetiterator =
(XSSFReader.SheetIterator)sheets;
while(sheetiterator.hasNext()) {
System.out.println("Processing new sheet:\n");
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
rowCache = handlerFactory.getRowCache();
sheet.close();
pkg.close();
if(!rowCache.isEmpty())
createCategoryMap(rowCache);
}
}
}
and this is my sheet handler factory class which is used in above function.
import java.util.LinkedList;
import java.util.List;
import org.xml.sax.Attributes;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ImportArticleDataProcessorExcelFileReaderFactory extends DefaultHandler{
private static final String ROW_EVENT = "row";
private static final String CELL_EVENT = "c";
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private List<String> cellCache = new LinkedList<>();
private List<String[]> rowCache = new LinkedList<>();
ImportArticleDataProcessorExcelFileReaderFactory(SharedStringsTable sst) {
this.sst = sst;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => cell
if (CELL_EVENT.equals(name)) {
String cellType = attributes.getValue("t");
if(cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
} else if (ROW_EVENT.equals(name)) {
if (!cellCache.isEmpty()) {
rowCache.add(cellCache.toArray(new String[cellCache.size()]));
}
cellCache.clear();
}
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// Process the last contents as required.
// Do now, as characters() may be called more than once
if(nextIsString) {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if(name.equals("v")) {
cellCache.add(lastContents.trim());
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
lastContents += new String(ch, start, length);
}
public List<String[]> getRowCache() {
return rowCache;
}
}
All other excel files which are not having inline string are able to read successfully, however with files having inline string inside, the algorithm only reads cellType=inlineStr but never gets the right value.
What I want:
All I want is just to print the values located inside inline string cell e.g. in my case it is "my value 1", "my value 2" and "my value 3"
if anyone is looking for similar solution, just want to let you know that i have solved it by adding these few lines in my ImportArticleDataProcessorExcelFileReaderFactory class above:
public void startElement(String uri, String localName, String name){
// rest of the code...
inlineStr = false;
if(cellType != null && cellType.equals("inlineStr")) {
inlineStr = true;
}
...
}
public void endElement(String uri, String localName, String name){
// rest of the code...
if(name.equals("t") && inlineStr) {
cellCache.add(lastContents.trim());
}
...
}
void characters function in above factory class correctly identifies contents of the cell and my changes as given in my answer successfully fills cellCache list with all the values from inline string cells.
please refer to #Axel's answer in comment above and follow this answer for your source: How to check a number in a string contains a date and exponential numbers while parsing excel file using apache event model in java
I have big xml files (~1GB) with this structure:
<?xml version="1.0" encoding="UTF-8"?>
<GenoExchange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.ncbi.nlm.nih.gov/SNP/geno" xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/geno ftp://ftp.ncbi.nlm.nih.gov/snp/specs/genoex_1_5.xsd" dbSNPBuildNo="146" reportId="MT" reportType="chromosome">
<Population popId="638" handle="TSC-CSHL" locPopId="TSC_42_AA">
<popClass self="NORTH AMERICA"/>
</Population>
<SnpInfo rsId="1041870" observed="C/T">
<SnpLoc genomicAssembly="107:GRCh38.p2" geneId="4512" geneSymbol="COX1" chrom="MT" start="6150" locType="2" rsOrientToChrom="fwd" contigAllele="T" contig="NC_012920:1"/>
<SsInfo ssId="1508548" locSnpId="TSC0349089" ssOrientToRs="fwd">
<ByPop popId="1303" sampleSize="184">
<AlleleFreq allele="T" freq="1"/>
<AlleleFreq allele="C" freq="0"/>
</ByPop>
</SsInfo>
</SnpInfo>
<SnpInfo rsId="1029293" observed="C/T">
<SnpLoc genomicAssembly="107:GRCh38.p2" geneId="4512" geneSymbol="COX1" chrom="MT" start="6307" locType="2" rsOrientToChrom="fwd" contigAllele="C" contig="NC_012920:1"/>
<SsInfo ssId="1494519" locSnpId="TSC0254145" ssOrientToRs="fwd">
<ByPop popId="639" sampleSize="82">
<AlleleFreq allele="T" freq="0"/>
<AlleleFreq allele="C" freq="1"/>
</ByPop>
<ByPop popId="1303" sampleSize="184">
<AlleleFreq allele="T" freq="0"/>
<AlleleFreq allele="C" freq="1"/>
</ByPop>
</SsInfo>
</SnpInfo>
I want to find a specific rsID, for example rsID="1029293" and extract all the information inside that node. I don't want to run all the file. I only want to find that ID, extract that information and end the iteration.
From what I read it's better if I use SAX or Stax parsers. I'm using SAX, this is my code:
class UserHandler extends DefaultHandler {
String rsID = null;
String i = "1029293";
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("SnpInfo")) {
rsID = attributes.getValue("rsId");
//System.out.println("value: " + rsID);
}
if((i).equals(rsID) &&
qName.equalsIgnoreCase("SnpInfo")){
System.out.println("Start Element: " + qName + " " + rsID);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("SsInfo")) {
String a = attributes.getValue("ssId");
System.out.println("SSID: " + a);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("ByPop")) {
String p = attributes.getValue("popId");
System.out.println("POPID: " + p);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("AlleleFreq")) {
String p = attributes.getValue("allele");
String f = attributes.getValue("freq");
System.out.println("ALLELE: " + p + " FREQ: " + f);
}
if ((i).equals(rsID) && qName.equalsIgnoreCase("GTypeFreq")) {
String p = attributes.getValue("gtype");
String f = attributes.getValue("freq");
System.out.println("GTYPE: " + p + " FREQ: " + f);
}
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("SnpInfo")) {
if((i).equals(rsID)
&& qName.equalsIgnoreCase("SnpInfo"))
System.out.println("End Element: " + qName);
}
}
}
public class XMLParser {
public static void main(String argv[]) {
try {
InputStream fileStream = new FileInputStream("/home/xml/gt_chr10.xml.gz");
InputStream gzipStream = new GZIPInputStream(fileStream);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(gzipStream, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
My problem is that my code searches the whole file for the ID and that takes more than 2 minutes each time. I can't have a code that takes so long.
Is there a better approach for this?
Using STAX gives you more control when parsing XML, since you actively pull elements from the stream. This way you can pull the next event, handle it and once you found your data, simply terminate the loop (using a flag or even a return statement if you must)
InputStream in = ...
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader(in);
boolean found = false;
while (!found && eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
switch (event.getEventType()) {
case XMLStreamConstants.START_ELEMENT:
// your logic here
// once you found your element, you can terminate the loop
found = true;
break;
case XMLStreamConstants.END_ELEMENT:
// your logic here
break;
}
}
(omitted exception and resource handling for brevity)
On a side note, you will gain some performance by combining your if ((i).equals(rsID) && ... into a single one, with detail checks in nested ifs
if ((i).equals(rsID)) {
if(qName.equalsIgnoreCase("GTypeFreq")) {
...
}
}
You can throw an exception in your end element handler, to indicate to the parser that it aborts parsing (http://www.ibm.com/developerworks/library/x-tipsaxstop/):
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("SnpInfo")) {
if((i).equals(rsID)
&& qName.equalsIgnoreCase("SnpInfo"))
System.out.println("End Element: " + qName);
throw SAXException("Element found.");
}
}
The only way to avoid parsing the whole file every time you run this is to put the data in an XML database. Parsing a 1Gb file is going to take about a minute, plus or minus depending on the speed of your machine and what processing you do on each node.
A streamed XSLT 3.0 solution is simply:
<xsl:transform version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.ncbi.nlm.nih.gov/SNP/geno">
<xsl:template name="xsl:initial-template">
<xsl:stream href="input.xml">
<xsl:copy-of select="/GenoExchange/SnpInfo[#rsId='1041870'][1]"/>
</xsl:stream>
</xsl:template>
</xsl:transform>
No need to write all that pesky SAX or StAX code.
I put the "[1]" predicate in to allow the processor to abandon the search when it has found the first hit.
The best approach is to use vtd-xml and xpath... 1GB xml file takes about 1.5GB heap space and < 10 sec in a 3~4 year old intel processor.see code example below.. One more thing, if you want to eliminate parsing entirely, you can create a vtd+XML file format so any subsequent query can directly access the vtd index portion, which could easily triple or quadruple your app performance...
import com.ximpleware.*;
public class simpleXpathSearch{
public static void main(String s[]) throws VTDException,java.io.UnsupportedEncodingException,java.io.IOException{
VTDGen vg = new VTDGen();
vg.setLCLevel(5);
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/*/*[#rsID='1029293']");
int i=0;
while((i=ap.evalXPath())!=-1){
// your code logic here
}
//Main class
public static void main(String[] args) {
SAXReader.read();
}
//SAXReader
public static void read(){
try {
XMLReader processor = XMLReaderFactory.createXMLReader();
processor.setContentHandler(new SAXController());
processor.parse(new InputSource("MyXML.xml"));
} catch (SAXException | IOException e) {
System.err.println(e.getMessage());
}
}
//SAXController
// The SAXController extends DefaultHandler
private int tab = 0;
private void tabulation() {
for (int i=0; i<tab; i++)
System.out.print(" ");
}
#Override
public void startDocument() {
tabulation();
System.out.println("Starting XML Document");
tab++;
}
#Override
public void endDocument() {
tab--;
tabulation();
System.out.println("Ending XML Document");
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
tabulation();
System.out.print(localName);
if (attributes.getLength()>0) {
for (int i=0; i<attributes.getLength(); i++) {
System.out.print(attributes.getLocalName(i)+": "+attributes.getValue(i));
}
}
System.out.println();
tab++;
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
tab--;
tabulation();
System.out.println(localName);
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String content= new String(ch, start, length);
content= content.replaceAll("[\t\n]", "").trim();
if (!content.equals("")) {
tabulation();
System.out.println(content);
}
}
I have a applet program which reads data from an xml file and puts the elements in a list of movie objects. It then puts it in a JTable which has a custom table model to handle the data as well as a renderer to draw the title of the movie name into the table cells. I originally placed it in a JFrame and it worked perfectly as show in the image below.
However when I place it in a class which extends JApplet and call the getContentPane method it appears as this.
cells appear as "no programmes available" as the custom renderer writes it when the String movieName is "null".
Here is the applet code
public class BackEndApplet extends JApplet{
private ArrayList<Channel> al;
public void init() {
setUp();
try{
SwingUtilities.invokeLater(new Runnable(){
public void run(){
createGUI();
}
});
} catch (Exception e){
System.err.println("Did not run successfully");
}
}
private void setUp(){
//parse xml file into Java
String fileName = "XMLFiles/bondFilms.xml";
MovieParser movieParser = new MovieParser();
movieParser.parseMovie(fileName);
//sort movie list to channel and sort by time
MovieChannelSorter mcSorter = new MovieChannelSorter();
mcSorter.sortMovieList(movieParser.getMovieList());
//retrieves channels from channel sorter
al = mcSorter.getChannels();
}
private void createGUI(){
ProgrammeGuidePanel gPane = new ProgrammeGuidePanel(al);
gPane.setOpaque(true);
setContentPane(gPane);
}
}
and this my Main Panel code:
public class ProgrammeGuidePanel extends JPanel{
private ArrayList <Channel> channels;
private String [] channelNames = {"Sean Connery",
"George Lazenby",
"Roger Moore",
"Timothy Dalton",
"Pierce Brosnan",
"Daniel Craig"};
private String [] pHeader = {"Slot 1","Slot 2","Slot 3","Slot 4"};
private CustomTModel customModel;
public ProgrammeGuidePanel(ArrayList <Channel> ch) {
super(new BorderLayout());
channels = ch;
//create title table
DefaultTableModel model = new DefaultTableModel();
model.addColumn("Channels",channelNames);
JTable channelTable = new JTable(model);
channelTable.setRowSelectionAllowed(false);
//Create and fill Programme table
customModel = new CustomTModel(channels,pHeader);
JTable programmeTable = new JTable(customModel);
//set up panel for titles
JScrollPane scroller1 = new JScrollPane(channelTable);
scroller1.setMinimumSize(new Dimension(100,500));
scroller1.setPreferredSize(new Dimension(150,250));
//set up panel for movies
JScrollPane scroller2 = new JScrollPane(programmeTable);
//scroller1.setMinimumSize(new Dimension(100,100));
//scroller1.setPreferredSize(new Dimension(300,250));
//add scrollPanes to main panel
add(scroller1,BorderLayout.WEST);
add(scroller2,BorderLayout.CENTER);
}
}
I also tried using appletviewer in command line but it doesnt appear when i run the html file.
I'm completely stumped at why its doing it. So any help will be greatly appreciated.
UPDATE:
I may have figured out why its displaying the wrong data. In my Sax parser I was using
InputStream xmlInput = new FileInputStream(fileName);
I tried creating an executable jar in eclipse and got the results of picture 2. So I assume my parser class was returning a list full of empty objects since it couldn'f find the xml file. I did some research and saw I had to use
InputStream xmlInput = getClass().getResourceAsStream("file.xml");
However it keeps returning null when I run it i eclipse. I've looked into some of the same questions in stackoverflow but I cant seem to get my head around on how to implement getResourceAsStream(). I've also used the getClassLoader() method and setting an absolute path with "/" but to no avail.
Here's an SSCCE of my parser.
public class XMLParser {
public static void main(String [] args){
XMLParser x = new XMLParser();
x.parse();
}
public void parse(){
SAXParserFactory factory = SAXParserFactory.newInstance();
try{
InputStream xmlInput = getClass().getResourceAsStream("file.xml");
//InputStream xmlInput = new FileInputStream("file.xml");
SAXParser saxParser = factory.newSAXParser();
Handler handler = new Handler();
saxParser.parse(xmlInput,handler);
for(int i = 0;i<handler.plist.size();i++){
System.out.println(handler.plist.get(i));
}
} catch (Throwable err){
err.printStackTrace();
}
}
private class Handler extends DefaultHandler{
ArrayList<String>plist = new ArrayList<String>();
private String name;
private String lastName;
private boolean bname;
private boolean blname;
#Override
public void startElement(String uri,String localName, String qName,
Attributes attributes) throws SAXException{
System.out.println("end element: " + qName);
if (qName.equalsIgnoreCase("NAME")) {
bname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
System.out.println("end element: " + qName);
}
#Override
public void characters(char ch[], int start, int length)
throws SAXException {
if(bname){
name = new String(ch,start,length);
plist.add(name);
bname = false;
}
if(blname){
lastName = new String(ch,start,length);
plist.add(lastName);
blname = false;
}
}
}
}
Here's what the structure looks like in Eclipse
I want to parse a very long string from an xml file. You can see the xml file here.
If you visit the above file, there is a "description" tag from which I want to parse the string. When there is a short short string, say 3-lines or 4-lines string in the "description" tag, then my parser(Java SAX parser) easily parse the string but, when the string is hundreds of lines then my parser cannot parse the string. You can check my code that I am using for the parsing and please let me know where I am going wrong in this regard. Please help me in this respect I would be very thankful to you for this act of kindness.
Here is the parser GetterSetter class
public class MyGetterSetter
{
private ArrayList<String> description = new ArrayList<String>();
public ArrayList<String> getDescription()
{
return description;
}
public void setDescription(String description)
{
this.description.add(description);
}
}
Here is the parser Handler class
public class MyHandler extends DefaultHandler
{
String elementValue = null;
Boolean elementOn = false;
Boolean item = false;
public static MyGetterSetter data = null;
public static MyGetterSetter getXMLData()
{
return data;
}
public static void setXMLData(MyGetterSetter data)
{
MyHandler.data = data;
}
public void startDocument() throws SAXException
{
data = new MyGetterSetter();
}
public void endDocument() throws SAXException
{
}
public void startElement(String namespaceURI, String localName,String qName, Attributes atts) throws SAXException
{
elementOn = true;
if (localName.equalsIgnoreCase("item"))
item = true;
}
public void endElement(String namespaceURI, String localName, String qName) throws SAXException
{
elementOn = false;
if(item)
{
if (localName.equalsIgnoreCase("description"))
{
data.setDescription(elementValue);
Log.d("--------DESCRIPTION------", elementValue +" ");
}
else if (localName.equalsIgnoreCase("item")) item = false;
}
}
public void characters(char ch[], int start, int length)
{
if (elementOn)
{
elementValue = new String(ch, start, length);
elementOn = false;
}
}
}
Use the org.w3c.dom package.
public static void main(String[] args) {
try {
URL url = new URL("http://www.aboutsports.co.uk/fixtures/");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(url.openStream());
NodeList list = doc.getElementsByTagName("item"); // get <item> nodes
for (int i = 0; i < list.getLength(); i++) {
Node item = list.item(i);
NodeList descriptions = ((Element)item).getElementsByTagName("description"); // get <description> nodes within an <item>
for (int j = 0; j < descriptions.getLength(); j++) {
Node description = descriptions.item(0);
System.out.println(description.getTextContent()); // print the text content
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
XPath in java is also great for extracting bits from XML documents. Here's an example.
You would use a XPathExpression like /item/description. When you would evaluate it on the XML InputStream, it would return a NodeList like above with all the <description> elements within a <item> element.
If you wanted to do it your way, with a DefaultHandler, you would need to set and unset flags so you can check if you are in the body of a <document> element. The code above probably does something similar internally, hiding it from you. The code is available in java, so why not use it?
I am trying to fetch data from a xml file in java using sax parser. I successfully got small amount of data but when data becomes too large and in multiple lines it gives only two lines of data, not all the lines. I am trying following code-
InputStreamReader isr = new InputStreamReader(is);
InputSource source = new InputSource(isr);
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
XMLReader xr = parser.getXMLReader();
GeofenceParametersXMLHandler handler = new GeofenceParametersXMLHandler();
xr.setContentHandler(handler);
xr.parse(source);
And my GeofenceParametersXMLHandler is-
private boolean inTimeZone = false;
private boolean inCoordinate = false;
private boolean outerBoundaryIs = false;
private boolean innerBoundaryIs = false;
private String timeZone;
private List<String> innerCoordinates = new ArrayList<String>();
private String outerCoordinates;
public String getTimeZone() {
return timeZone;
}
public List<String> getInnerCoordinates() {
return innerCoordinates;
}
public String getOuterCoordinates() {
return outerCoordinates;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
if (this.inTimeZone) {
this.timeZone = new String(ch, start, length);
this.inTimeZone = false;
}
if (this.inCoordinate && this.innerBoundaryIs) {
this.innerCoordinates.add(new String(ch, start, length));
this.inCoordinate = false;
this.innerBoundaryIs = false;
}
if (this.inCoordinate && this.outerBoundaryIs) {
this.outerCoordinates = new String(ch, start, length);
this.inCoordinate = false;
this.outerBoundaryIs = false;
}
}
#Override
public void endElement(String uri, String localName, String name) throws SAXException {
super.endElement(uri, localName, name);
}
#Override
public void startDocument() throws SAXException {
super.startDocument();
}
#Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase("timezone")) {
this.inTimeZone = true;
}
if (localName.equalsIgnoreCase("outerBoundaryIs")) {
this.outerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("innerBoundaryIs")) {
this.innerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("coordinates")) {
this.inCoordinate = true;
}
}
And the xml file is-
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2">
<Placemark>
<name>gx:altitudeMode Example</name>
<timezone>EASTERN</timezone>
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-77.05788457660967,38.87253259892824,100
-77.05465973756702,38.87291016281703,100
-77.05315536854791,38.87053267794386,100
-77.05552622493516,38.868757801256,100
-77.05844056290393,38.86996206506943,100
-77.05788457660967,38.87253259892824,100
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
I always got two line of data for coordinates. But when they are in single line I got complete data. How to fetch complete data in multiple line?
Thanks in Advance.
The characters() method won't necessarily give you all the text data in one go (this is a very common misconception, btw).
The proper approach is to concatenate all the data returned by successive calls to characters() (using a StringBuilder or similar). Once your endElement() method is called, you can then treat that text buffer as complete and process it as such.
From the doc:
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks
Often you see that for a small XML doc one call to characters() will suffice. However as your XML doc increases in size, you'll find that due to buffering etc. you'll start getting multiple calls. Consequently each call treated on its own appears to be incomplete.