My Parser Skips Elements - java

I'm trying to make an RSS reader that uses an XML from the web. For some reason it only reads the last element.
This is pretty much the XML file:
<rss version="2.0">
<channel>
<item>
<mainTitle>...</mainTitle>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
</item>
<item>
<mainTitle>...</mainTitle>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
</item>
</channel>
</rss>
This is the parser:
public class RssHandler extends DefaultHandler {
// Feed and Article objects to use for temporary storage
private Article currentArticle = new Article();
private List<Article> articleList = new ArrayList<Article>();
// Number of articles added so far
private int articlesAdded = 0;
// Number of articles to download
private static final int ARTICLES_LIMIT = 20;
// Current characters being accumulated
StringBuffer chars = new StringBuffer();
// Current characters being accumulated
int cap = new StringBuffer().capacity();
// Basic Booleans
private boolean wantedItem = false;
private boolean wantedHeadline = false;
private boolean wantedTitle = false;
public List<Article> getArticleList() {
return articleList;
}
public Article getParsedData() {
return this.currentArticle;
}
public RssHandler() {
this.currentArticle = new Article();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) {
chars = new StringBuffer();
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (localName.equalsIgnoreCase("title")) {
currentArticle.setTitle(chars.toString());
} else if (localName.equalsIgnoreCase("subtitle")) {
currentArticle.setDescription(chars.toString());
} else if (localName.equalsIgnoreCase("pubdate")) {
currentArticle.setPubDate(chars.toString());
} else if (localName.equalsIgnoreCase("guid")) {
currentArticle.setGuid(chars.toString());
} else if (localName.equalsIgnoreCase("author")) {
currentArticle.setAuthor(chars.toString());
} else if (localName.equalsIgnoreCase("link")) {
currentArticle.setEncodedContent(chars.toString());
} else if (localName.equalsIgnoreCase("item"))
// Check if looking for article, and if article is complete
if (localName.equalsIgnoreCase("item")) {
articleList.add(currentArticle);
currentArticle = new Article();
// Lets check if we've hit our limit on number of articles
articlesAdded++;
if (articlesAdded >= ARTICLES_LIMIT) {
throw new SAXException();
}
}
chars.setLength(0);
}
public void characters(char ch[], int start, int length) {
chars.append(ch, start, length);
}
}
Whenever I debug the application qName is never a direct child of Item.
It reads rss -> channel -> item -> title -> description ...
I'm clueless. Please help!

1) At the end of endElement() method, you are not resetting the chars length i.e
public void endElement(String uri, String localName, String qName)
throws SAXException {
//...
//Reset 'chars' length at the end always.
chars.setLength(0);
}
2) Change your characters(...) method like below:
public void characters(char ch[], int start, int length) {
chars.append(ch, start, length);
}
[EDIT
3) Move initialization of 'chars' from 'startElement' to constructor. i.e:
public RssHandler() {
this.currentArticle = new Article();
//Add here..
chars = new StringBuffer();
}
and,
public void startElement(String uri, String localName, String qName,
Attributes atts) {
//Remove below line..
//chars = new StringBuffer();
}
4) Finally, use qName to check matching tags instead of localName i.e
if (qName.equalsIgnoreCase("title")) {
currentArticle.setTitle(chars.toString().trim());
} else if (qName.equalsIgnoreCase("subtitle")) {
currentArticle.setDescription(chars.toString().trim());
} //...
EDIT]
More info # Using SAXParser in Android

Related

SAX Parser does not display multiple identical tags

Previously, I was able to display the data of one tag, but this time not several values are displayed, but only one.
This my parser code:
public class Runner {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler handler = new MyHandler();
xmlReader.setContentHandler(handler);
xmlReader.parse("src/countries.xml");
Countries branches = handler.getBranches();
try (FileWriter files = new FileWriter("src/diploma/SAX.txt")) {
files.write("Item " + "\n" + String.valueOf(branches.itemList) + "\n");
}
}
private static class MyHandler extends DefaultHandler{
static final String HISTORY_TAG = "history";
static final String ITEM_TAG = "item";
static final String NAME_ATTRIBUTE = "name";
public Countries branches;
public Item currentItem;
private String currencyElement;
Countries getBranches(){
return branches;
}
public void startDocument() throws SAXException {
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
currencyElement = qName;
switch (currencyElement) {
case HISTORY_TAG: {
branches.itemList = new ArrayList<>();
currentItem = new Item();
currentItem.setHistoryName(String.valueOf(attributes.getValue(NAME_ATTRIBUTE)));
} break;
default: {}
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String text = new String(ch, start, length);
if (text.contains("<") || currencyElement == null){
return;
}
switch (currencyElement) {
case ITEM_TAG: {
currentItem.setItem(text);
} break;
default: { }
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException{
switch (qName) {
case HISTORY_TAG: {
branches.itemList.add(currentItem);
currentItem = null;
} break;
default: {
}
}
currencyElement = null;
}
public void endDocument() throws SAXException {
System.out.println("SAX parsing is completed...");
}
}
}
Class Item:
public class Item {
private String historyName;
private String item;
public String getItem() {
return item;
}
public void setItem(String item) {
this.item = item;
}
public String getHistoryName() {
return historyName;
}
public void setHistoryName(String historyName) {
this.historyName = historyName;
}
#Override
public String toString() {
return
"historyName = " + historyName + ", " + "\n" + "item = " + item + ", ";
}
}
And class Countries
public class Countries {
public List<Item> itemList;
}
I have problems with this part
<history name="История">
<item>
История белорусских земель очень богата и самобытна.
</item>
<item>
Эту страну постоянно раздирали внутренние конфликты и противоречия, много раз она была втянута в войны.
</item>
<item>
В 1945 году Беларусь вступила в состав членов-основателей Организации Объединенных Наций.
</item>
</history>
I only display the last "item" tag, and other duplicate tags are displayed only in the singular. I can't figure out where the error is, but I noticed that in "endElement" all values are displayed, but as one element. Maybe someone knows what's the matter?
You are creating a new ArrayList every time you encounter the item tag.
That is why you only see one item displayed after parsing.

RSS Reader using Sax Parser losing characters from title

I'm trying to use a SAX parser in order to return the contents of an RSS feed from a URL - http://pitchfork.com/rss/news/, but often characters are lost in displaying the title, showing partial text or just a closing tag ">"
How can i modify my handler class to prevent this? I think I should probably use StringBuilder or StringBuffer, but i'm not sure how to implement it.
ParseHandler.java
public class RssParseHandler extends DefaultHandler {
//Parsed items
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
private boolean parsing_id;
private boolean parsingDescription;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
}
public List<RssItem> getItems() {
return rssItems;
}
//Creates empty RssItem object during the process of an item start tag
//Indicators are set to true when particular tag is being processed
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("item".equals(qName)) {
currentItem = new RssItem();
} else if ("title".equals(qName)) {
parsingTitle = true;
} else if ("link".equals(qName)) {
parsingLink = true;
} else if ("_id".equals(qName)) {
parsing_id = true;
} else if ("description".equals(qName)) {
parsingDescription = true;
}
}
//Current RssItem is added to the list following process of end tag
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("item".equals(qName)) {
rssItems.add(currentItem);
currentItem = null;
} else if ("title".equals(qName)) {
parsingTitle = false;
} else if ("link".equals(qName)) {
parsingLink = false;
} else if ("_id".equals(qName)) {
parsing_id = false;
} else if ("description".equals(qName)) {
parsingDescription = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null)
currentItem.setTitle(new String(ch, start, length));
} else if (parsingLink) {
if (currentItem != null) {
currentItem.setLink(new String(ch, start, length));
parsingLink = false;
}
} else if (parsing_id) {
if (currentItem != null) {
currentItem.set_id(new String(ch, start, length));
parsing_id = false;
}
} else if (parsingDescription) {
if (currentItem != null) {
currentItem.setDescription(new String(ch, start, length));
parsingDescription = false;
}
}
}}//rssHandlerClass
Use a StringBuilder to build the tag, rather than using a new String instance as the documentation says:
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
And #CommonWares says this exactly in his post Here.
Build your tag as it is found using StringBuilder, since there is chunks coming in at once rather than the entire string (This explains the incomplete tags!). You may or may not need the isBuilding flag, but I don't know your entire implementation so I added it incase.
StringBuilder mSb;
boolean isBuilding;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
mSb = new StringBuilder();
isBuilding = true;
if(qName.equals("title")){
parsingTitle = true;
}
...
...
}
#Override
public void characters (char ch[], int start, int length) {
if (mSb !=null && isBuilding) {
for (int i=start; i<start+length; i++) {
mSb.append(ch[i]);
}
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(parsingTitle){
currentItem.setTitle(sb.toString().trim());
parsingTitle = false;
isBuilding = false;
}
}

How does SAX parsing method charecters() work

I am trying to read data from an xml file and store them in a list using the SAX parsing method. My problem is when I try to store the values of my data using the characters() method. I am creating an object where for each element I store each value and some extra information for some later use but when I try to store said value it stores spaces instead. I tried printing inside the method and while it seams to go through all my xml file it prints only a couple of the elements and not even in the right order. So can someone explain me what I am missing?
XML FILE:
<?xml version="1.0" encoding="UTF-8"?>
<CarModel>
<Audi model = "TT" year = "2006" starting_price = "33.000$">
<type>sport</type>
<horse_power>222hp</horse_power>
<drivetrain>quattro</drivetrain
<transmission>6_Manual</transmission>
</Audi>
<Mercedes model = "W222_S400" year = "2013" starting_price =
63.000$">
<type>luxury</type>
<horse_power>302hp</horse_power>
<drivetrain>front_wheel_drive</drivetrain>
<transmission>7_Automatic</transmission>
</Mercedes>
</CarModel>
JAVA CODE :
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
lvl_cnt++;
System.out.println(lvl_cnt);
obj = new xml_obj();
obj.setLvl(lvl_cnt);
System.out.println("LVL "+obj.getLvl());
if (lvl_cnt == 0) {
obj.setValue(qName);
obj.setParent("root");
System.out.println(obj.getParent());
}
else {
xml_obj tmp = objListofLists.get(objListofLists.size()-1);
int lvl_before = tmp.getLvl();
System.out.println("AAA" + lvl_before);
if (lvl_cnt > lvl_before) {
obj.setParent(tmp.getValue());
}
else if (lvl_cnt < lvl_before) {
int j = 0;
while (objListofLists.get(j).getLvl() != lvl_cnt) j++;
tmp = objListofLists.get(j);
obj.setParent(tmp.getParent());
}
else {
obj.setParent(tmp.getParent());
}
System.out.println(obj.getParent());
}
obj.attributes = attributes;
objListofLists.add(obj);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
lvl_cnt--;
}
public void characters (char ch[], int start, int length) throws SAXException {
String help = new String(ch, start, length);
System.out.println(help);
objListofLists.get(objListofLists.size()-1).setValue(help);
}
}
The characters method only prints the text inside the elements (no attributes), but that also includes all the empty spaces, tabs, and newlines that might be present in the document and are descendants of the root element.
If you want to read text that is inside a particular element, you should set a flag when you enter that element (in startElement()), then unset it in endElement(), and inside characters() you test if you are currently in the element from which you wish to extract text.
private boolean inTypeTag = false;
public void startElement(String uri, String localName, String qName, ...) ...{
if (qName.equals("type") {
inTypeTag = true;
}
...
}
public void endElement(String uri, String localName, String qName, ...) ...{
if (qName.equals("type") {
inTypeTag = false;
}
...
}
public void characters(char ch[], int start, int length) ... {
if (inTypeTag) {
// do something with the text ("sport") which was found in here
}
...
}

How can I parse long strings from an online xml file in Android application?

I want to parse a very long string from an xml file. You can see the xml file here.
If you visit the above file, there is a "description" tag from which I want to parse the string. When there is a short short string, say 3-lines or 4-lines string in the "description" tag, then my parser(Java SAX parser) easily parse the string but, when the string is hundreds of lines then my parser cannot parse the string. You can check my code that I am using for the parsing and please let me know where I am going wrong in this regard. Please help me in this respect I would be very thankful to you for this act of kindness.
Here is the parser GetterSetter class
public class MyGetterSetter
{
private ArrayList<String> description = new ArrayList<String>();
public ArrayList<String> getDescription()
{
return description;
}
public void setDescription(String description)
{
this.description.add(description);
}
}
Here is the parser Handler class
public class MyHandler extends DefaultHandler
{
String elementValue = null;
Boolean elementOn = false;
Boolean item = false;
public static MyGetterSetter data = null;
public static MyGetterSetter getXMLData()
{
return data;
}
public static void setXMLData(MyGetterSetter data)
{
MyHandler.data = data;
}
public void startDocument() throws SAXException
{
data = new MyGetterSetter();
}
public void endDocument() throws SAXException
{
}
public void startElement(String namespaceURI, String localName,String qName, Attributes atts) throws SAXException
{
elementOn = true;
if (localName.equalsIgnoreCase("item"))
item = true;
}
public void endElement(String namespaceURI, String localName, String qName) throws SAXException
{
elementOn = false;
if(item)
{
if (localName.equalsIgnoreCase("description"))
{
data.setDescription(elementValue);
Log.d("--------DESCRIPTION------", elementValue +" ");
}
else if (localName.equalsIgnoreCase("item")) item = false;
}
}
public void characters(char ch[], int start, int length)
{
if (elementOn)
{
elementValue = new String(ch, start, length);
elementOn = false;
}
}
}
Use the org.w3c.dom package.
public static void main(String[] args) {
try {
URL url = new URL("http://www.aboutsports.co.uk/fixtures/");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(url.openStream());
NodeList list = doc.getElementsByTagName("item"); // get <item> nodes
for (int i = 0; i < list.getLength(); i++) {
Node item = list.item(i);
NodeList descriptions = ((Element)item).getElementsByTagName("description"); // get <description> nodes within an <item>
for (int j = 0; j < descriptions.getLength(); j++) {
Node description = descriptions.item(0);
System.out.println(description.getTextContent()); // print the text content
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
XPath in java is also great for extracting bits from XML documents. Here's an example.
You would use a XPathExpression like /item/description. When you would evaluate it on the XML InputStream, it would return a NodeList like above with all the <description> elements within a <item> element.
If you wanted to do it your way, with a DefaultHandler, you would need to set and unset flags so you can check if you are in the body of a <document> element. The code above probably does something similar internally, hiding it from you. The code is available in java, so why not use it?

Read multiple lines from xml

I am trying to fetch data from a xml file in java using sax parser. I successfully got small amount of data but when data becomes too large and in multiple lines it gives only two lines of data, not all the lines. I am trying following code-
InputStreamReader isr = new InputStreamReader(is);
InputSource source = new InputSource(isr);
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
XMLReader xr = parser.getXMLReader();
GeofenceParametersXMLHandler handler = new GeofenceParametersXMLHandler();
xr.setContentHandler(handler);
xr.parse(source);
And my GeofenceParametersXMLHandler is-
private boolean inTimeZone = false;
private boolean inCoordinate = false;
private boolean outerBoundaryIs = false;
private boolean innerBoundaryIs = false;
private String timeZone;
private List<String> innerCoordinates = new ArrayList<String>();
private String outerCoordinates;
public String getTimeZone() {
return timeZone;
}
public List<String> getInnerCoordinates() {
return innerCoordinates;
}
public String getOuterCoordinates() {
return outerCoordinates;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
if (this.inTimeZone) {
this.timeZone = new String(ch, start, length);
this.inTimeZone = false;
}
if (this.inCoordinate && this.innerBoundaryIs) {
this.innerCoordinates.add(new String(ch, start, length));
this.inCoordinate = false;
this.innerBoundaryIs = false;
}
if (this.inCoordinate && this.outerBoundaryIs) {
this.outerCoordinates = new String(ch, start, length);
this.inCoordinate = false;
this.outerBoundaryIs = false;
}
}
#Override
public void endElement(String uri, String localName, String name) throws SAXException {
super.endElement(uri, localName, name);
}
#Override
public void startDocument() throws SAXException {
super.startDocument();
}
#Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase("timezone")) {
this.inTimeZone = true;
}
if (localName.equalsIgnoreCase("outerBoundaryIs")) {
this.outerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("innerBoundaryIs")) {
this.innerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("coordinates")) {
this.inCoordinate = true;
}
}
And the xml file is-
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2">
<Placemark>
<name>gx:altitudeMode Example</name>
<timezone>EASTERN</timezone>
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-77.05788457660967,38.87253259892824,100
-77.05465973756702,38.87291016281703,100
-77.05315536854791,38.87053267794386,100
-77.05552622493516,38.868757801256,100
-77.05844056290393,38.86996206506943,100
-77.05788457660967,38.87253259892824,100
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
I always got two line of data for coordinates. But when they are in single line I got complete data. How to fetch complete data in multiple line?
Thanks in Advance.
The characters() method won't necessarily give you all the text data in one go (this is a very common misconception, btw).
The proper approach is to concatenate all the data returned by successive calls to characters() (using a StringBuilder or similar). Once your endElement() method is called, you can then treat that text buffer as complete and process it as such.
From the doc:
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks
Often you see that for a small XML doc one call to characters() will suffice. However as your XML doc increases in size, you'll find that due to buffering etc. you'll start getting multiple calls. Consequently each call treated on its own appears to be incomplete.

Categories

Resources