SAXParser giving unexpected random results - java

I am developing a RSS feed reader for Android, and for parsing XML files, I am using SAX APIs. The problem is that while parsing the data, some of the text is truncated in a random fashion in some randomly selected tags (I mean different instances of same tag). To me more clear, I have added a screenshot.
Here is my Handler class:
public class RssParseHandler extends DefaultHandler {
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
//StringBuilder temp;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
//temp = new StringBuilder();
}
public List<RssItem> getItems() {
return rssItems;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("item".equals(qName)) {
currentItem = new RssItem();
} else if ("title".equals(qName)) {
parsingTitle = true;
} else if ("link".equals(qName)) {
parsingLink = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("item".equals(qName)) {
rssItems.add(currentItem);
currentItem = null;
} else if ("title".equals(qName)) {
//currentItem.setTitle(new String(temp));
//temp = new StringBuilder();
parsingTitle = false;
} else if ("link".equals(qName)) {
//currentItem.setLink(new String(temp));
//temp = new StringBuilder();
parsingLink = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null)
{
//temp.append(ch, start, length);
currentItem.setTitle(new String(ch, start, length));
}
} else if (parsingLink) {
if (currentItem != null) {
//temp.append(ch, start, length);
currentElement.setLink(new String(ch, start, length));
parsingLink = false;
}
}
}
}
The methods setTitle(String str) and setLink(String str) are setter methods of class RSSItem.
I googled this problem and read somewhere to use StringBuilder instead. Hence I tried by using StringBuilder. ( I have commented the code when I used StringBuilder). But then I started receiving NullPointerException.
Any suggestions to get rid of this problem ?

From the doc
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks; however, all of the
characters in any single event must come from the same external entity
so that the Locator provides useful information.
So propably you are getting a partial chunk of data. A possible solution could be:
if (currentItem != null) {
//temp.append(ch, start, length);
String tmpLink = currentElement.getLink();
tmpLink += new String(ch, start, length);
currentElement.setLink(tmpLink);
}
of course currentElement.getLink() should return an empty String and not a null reference.

Your problem is that you assume characters method will handle all characters inside the element, which is not true.
You should save and concatenate new characters with previous characters if any.
Using StringBuilder is good for your cause. You just need to handle the NPE you've got.

Related

RSS Reader using Sax Parser losing characters from title

I'm trying to use a SAX parser in order to return the contents of an RSS feed from a URL - http://pitchfork.com/rss/news/, but often characters are lost in displaying the title, showing partial text or just a closing tag ">"
How can i modify my handler class to prevent this? I think I should probably use StringBuilder or StringBuffer, but i'm not sure how to implement it.
ParseHandler.java
public class RssParseHandler extends DefaultHandler {
//Parsed items
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
private boolean parsing_id;
private boolean parsingDescription;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
}
public List<RssItem> getItems() {
return rssItems;
}
//Creates empty RssItem object during the process of an item start tag
//Indicators are set to true when particular tag is being processed
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("item".equals(qName)) {
currentItem = new RssItem();
} else if ("title".equals(qName)) {
parsingTitle = true;
} else if ("link".equals(qName)) {
parsingLink = true;
} else if ("_id".equals(qName)) {
parsing_id = true;
} else if ("description".equals(qName)) {
parsingDescription = true;
}
}
//Current RssItem is added to the list following process of end tag
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("item".equals(qName)) {
rssItems.add(currentItem);
currentItem = null;
} else if ("title".equals(qName)) {
parsingTitle = false;
} else if ("link".equals(qName)) {
parsingLink = false;
} else if ("_id".equals(qName)) {
parsing_id = false;
} else if ("description".equals(qName)) {
parsingDescription = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null)
currentItem.setTitle(new String(ch, start, length));
} else if (parsingLink) {
if (currentItem != null) {
currentItem.setLink(new String(ch, start, length));
parsingLink = false;
}
} else if (parsing_id) {
if (currentItem != null) {
currentItem.set_id(new String(ch, start, length));
parsing_id = false;
}
} else if (parsingDescription) {
if (currentItem != null) {
currentItem.setDescription(new String(ch, start, length));
parsingDescription = false;
}
}
}}//rssHandlerClass
Use a StringBuilder to build the tag, rather than using a new String instance as the documentation says:
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
And #CommonWares says this exactly in his post Here.
Build your tag as it is found using StringBuilder, since there is chunks coming in at once rather than the entire string (This explains the incomplete tags!). You may or may not need the isBuilding flag, but I don't know your entire implementation so I added it incase.
StringBuilder mSb;
boolean isBuilding;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
mSb = new StringBuilder();
isBuilding = true;
if(qName.equals("title")){
parsingTitle = true;
}
...
...
}
#Override
public void characters (char ch[], int start, int length) {
if (mSb !=null && isBuilding) {
for (int i=start; i<start+length; i++) {
mSb.append(ch[i]);
}
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(parsingTitle){
currentItem.setTitle(sb.toString().trim());
parsingTitle = false;
isBuilding = false;
}
}

JAVA/SAX - Loss of characters using XML Parser

I'm using SAX Parser to parse the XML file of RSS feeds on an Android App and sometimes the parsing of the pubDate of an item isn't completed (incomplete characters).
Ex:
Actual PubDate Thu, 02 Apr 2015 12:23:41 +0000
PubDate Result of the parse: Thu,
Here is the code that I'm using in the parser handler:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("item".equalsIgnoreCase(localName)) {
currentItem = new RssItem(url);
} else if ("title".equalsIgnoreCase(localName)) {
parsingTitle = true;
} else if ("link".equalsIgnoreCase(localName)) {
parsingLink = true;
} else if ("pubDate".equalsIgnoreCase(localName)) {
parsingDate = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("item".equalsIgnoreCase(localName)) {
rssItems.add(currentItem);
currentItem = null;
} else if ("title".equalsIgnoreCase(localName)) {
parsingTitle = false;
} else if ("link".equalsIgnoreCase(localName)) {
parsingLink = false;
} else if ("pubDate".equalsIgnoreCase(localName)) {
parsingDate = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null) {
currentItem.setTitle(new String(ch, start, length));
parsingTitle = false;
}
} else if (parsingLink) {
if (currentItem != null) {
currentItem.setLink(new String(ch, start, length));
parsingLink = false;
}
} else if (parsingDate) {
if (currentItem != null) {
currentItem.setDate(new String(ch, start, length));
parsingDate = false;
}
}
}
The loss of characters is pretty random, it happens in different XML items every time I run the app.
You are assuming that there is exactly one characters() call per element. That is not a safe assumption. Build up your string over 1+ calls to characters(), then apply it in endElement().
Or, better yet, use any one of a number of existing RSS parser libraries.

Java Sax Parser only returning one line of a tag

I am trying to parse the description tag in the xml but it only outputs one line:
description: <img src=http://www.ovations365.com/sites/ovations365.com/images/event/441705771/sparkswebsite_medium.jpg alt="SPARKS: Understanding Energy">
That is only a small part of the text in the CDATA and I'm trying to output the description for multiple items. Why can't I get the whole CDATA?
The XML is located: http://feeds.feedburner.com/Events-Ovations365
package com.example.ovations_proj;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import com.example.ovations_proj.RssItem;
public class RssParseHandler extends DefaultHandler {
private List<RssItem> rssItems;
// Used to reference item while parsing
private RssItem currentItem;
// Parsing title indicator
private boolean parsingTitle;
// Parsing link indicator
private boolean parsingLink;
private boolean parsingDes;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
}
public List<RssItem> getItems() {
return rssItems;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if ("item".equals(qName)) { //item
currentItem = new RssItem();
} else if ("title".equals(qName)) { //title
parsingTitle = true;
} else if ("link".equals(qName)) { //link
parsingLink = true;
}else if ("description".equals(qName) ) { //description
parsingDes = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element :" + qName);
if ("item".equals(qName)) {
rssItems.add(currentItem);//item
currentItem = null;
} else if ("title".equals(qName)) {//title
parsingTitle = false;
} else if ("link".equals(qName)) {//link
parsingLink = false;
} else if ("description".equals(qName) ) { //description
parsingDes = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null){
currentItem.setTitle(new String(ch, start, length));
}
} else if (parsingLink) {
if (currentItem != null) {
currentItem.setLink(new String(ch, start, length));
parsingLink = false;
}
} else if (parsingDes) {
if (currentItem != null) {
currentItem.setDes(new String(ch, start, length));
System.out.println("description: " + currentItem.getDes());
parsingDes = false;
}
}
}
}
It seems that the character data in the <![CDATA[...]]> sections is being sent in multiple chunks, i.e. in multiple calls to the characters method.
The ContentHandler documentation for the characters method mentions that SAX parsers are free to do this:
SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks[....]
You'll therefore need to adjust your characters method to handle being called multiple times for the same chunk of contiguous character data.

How to parse XML file, with a binary data element using SAX parser?

I receive XML files that need parsing. I code in java regularly, so java SAX was my natural
first choice. The XML files have a combination of text elements and one binary element (.xls file).
My parser handler is as:
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException{
if(qName.equalsIgnoreCase("To")){
toFlag = true;
}
if(qName.equalsIgnoreCase("Subject")){
subjectFlag = true;
}
if(qName.equalsIgnoreCase("OutDocumentId")){
outdocmentIdFlag = true;
}
if(qName.equalsIgnoreCase("Filename")){
filenameFlag = true;
}
if(qName.equalsIgnoreCase("EmailType")){
emailTypeFlag = true;
}
if(qName.equalsIgnoreCase("Context")){
contextTypeFlag = true;
}
if(qName.equalsIgnoreCase("Blob")){
blobTypeFlag = true;
}
}
And the element data is parsed here:
public void characters(char ch[], int start, int length) throws SAXException{
String text = null;
if (toFlag) {
text = new String(ch, start, length);
getRequest().setRecipientEmail(text);
toFlag = false;
}
if (subjectFlag) {
text = new String(ch, start, length);
getRequest().setSubject(text);
subjectFlag = false;
}
if (outdocmentIdFlag) {
text = new String(ch, start, length);
getRequest().setOutDocId(text);
outdocmentIdFlag = false;
}
if (filenameFlag) {
text = new String(ch, start, length);
getRequest().setFilename(text);
filenameFlag = false;
}
if(emailTypeFlag) {
text = new String(ch, start, length);
getRequest().setEmailType(Integer.parseInt(text));
emailTypeFlag = false;
}
if(contextTypeFlag) {
text = new String(ch, start, length);
getRequest().setContext(text);
contextTypeFlag = false;
}
if(blobTypeFlag) {
text = new String(ch, start, length);
try {
getRequest().setBlob(Hibernate.createBlob(text.getBytes("UTF-16")));
} catch (UnsupportedEncodingException e) {
System.out.println("Error creating blob");
e.printStackTrace();
}
blobTypeFlag = false;
}
}
}
The problem is with the blob element, its being read in as a char[]
(which I believe is incorrect ) ... because that's what they parent
class allow to override during event processing.
Does anybody know how to use the SAX parse when one element, is not
text but binary instead?
Greatly appreciated
Take the char data and send it to a Base64 decoder.

Read multiple lines from xml

I am trying to fetch data from a xml file in java using sax parser. I successfully got small amount of data but when data becomes too large and in multiple lines it gives only two lines of data, not all the lines. I am trying following code-
InputStreamReader isr = new InputStreamReader(is);
InputSource source = new InputSource(isr);
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
XMLReader xr = parser.getXMLReader();
GeofenceParametersXMLHandler handler = new GeofenceParametersXMLHandler();
xr.setContentHandler(handler);
xr.parse(source);
And my GeofenceParametersXMLHandler is-
private boolean inTimeZone = false;
private boolean inCoordinate = false;
private boolean outerBoundaryIs = false;
private boolean innerBoundaryIs = false;
private String timeZone;
private List<String> innerCoordinates = new ArrayList<String>();
private String outerCoordinates;
public String getTimeZone() {
return timeZone;
}
public List<String> getInnerCoordinates() {
return innerCoordinates;
}
public String getOuterCoordinates() {
return outerCoordinates;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
if (this.inTimeZone) {
this.timeZone = new String(ch, start, length);
this.inTimeZone = false;
}
if (this.inCoordinate && this.innerBoundaryIs) {
this.innerCoordinates.add(new String(ch, start, length));
this.inCoordinate = false;
this.innerBoundaryIs = false;
}
if (this.inCoordinate && this.outerBoundaryIs) {
this.outerCoordinates = new String(ch, start, length);
this.inCoordinate = false;
this.outerBoundaryIs = false;
}
}
#Override
public void endElement(String uri, String localName, String name) throws SAXException {
super.endElement(uri, localName, name);
}
#Override
public void startDocument() throws SAXException {
super.startDocument();
}
#Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase("timezone")) {
this.inTimeZone = true;
}
if (localName.equalsIgnoreCase("outerBoundaryIs")) {
this.outerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("innerBoundaryIs")) {
this.innerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("coordinates")) {
this.inCoordinate = true;
}
}
And the xml file is-
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2">
<Placemark>
<name>gx:altitudeMode Example</name>
<timezone>EASTERN</timezone>
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-77.05788457660967,38.87253259892824,100
-77.05465973756702,38.87291016281703,100
-77.05315536854791,38.87053267794386,100
-77.05552622493516,38.868757801256,100
-77.05844056290393,38.86996206506943,100
-77.05788457660967,38.87253259892824,100
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
I always got two line of data for coordinates. But when they are in single line I got complete data. How to fetch complete data in multiple line?
Thanks in Advance.
The characters() method won't necessarily give you all the text data in one go (this is a very common misconception, btw).
The proper approach is to concatenate all the data returned by successive calls to characters() (using a StringBuilder or similar). Once your endElement() method is called, you can then treat that text buffer as complete and process it as such.
From the doc:
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks
Often you see that for a small XML doc one call to characters() will suffice. However as your XML doc increases in size, you'll find that due to buffering etc. you'll start getting multiple calls. Consequently each call treated on its own appears to be incomplete.

Categories

Resources