Read multiple lines from xml - java

I am trying to fetch data from a xml file in java using sax parser. I successfully got small amount of data but when data becomes too large and in multiple lines it gives only two lines of data, not all the lines. I am trying following code-
InputStreamReader isr = new InputStreamReader(is);
InputSource source = new InputSource(isr);
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
XMLReader xr = parser.getXMLReader();
GeofenceParametersXMLHandler handler = new GeofenceParametersXMLHandler();
xr.setContentHandler(handler);
xr.parse(source);
And my GeofenceParametersXMLHandler is-
private boolean inTimeZone = false;
private boolean inCoordinate = false;
private boolean outerBoundaryIs = false;
private boolean innerBoundaryIs = false;
private String timeZone;
private List<String> innerCoordinates = new ArrayList<String>();
private String outerCoordinates;
public String getTimeZone() {
return timeZone;
}
public List<String> getInnerCoordinates() {
return innerCoordinates;
}
public String getOuterCoordinates() {
return outerCoordinates;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
if (this.inTimeZone) {
this.timeZone = new String(ch, start, length);
this.inTimeZone = false;
}
if (this.inCoordinate && this.innerBoundaryIs) {
this.innerCoordinates.add(new String(ch, start, length));
this.inCoordinate = false;
this.innerBoundaryIs = false;
}
if (this.inCoordinate && this.outerBoundaryIs) {
this.outerCoordinates = new String(ch, start, length);
this.inCoordinate = false;
this.outerBoundaryIs = false;
}
}
#Override
public void endElement(String uri, String localName, String name) throws SAXException {
super.endElement(uri, localName, name);
}
#Override
public void startDocument() throws SAXException {
super.startDocument();
}
#Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase("timezone")) {
this.inTimeZone = true;
}
if (localName.equalsIgnoreCase("outerBoundaryIs")) {
this.outerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("innerBoundaryIs")) {
this.innerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("coordinates")) {
this.inCoordinate = true;
}
}
And the xml file is-
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2">
<Placemark>
<name>gx:altitudeMode Example</name>
<timezone>EASTERN</timezone>
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-77.05788457660967,38.87253259892824,100
-77.05465973756702,38.87291016281703,100
-77.05315536854791,38.87053267794386,100
-77.05552622493516,38.868757801256,100
-77.05844056290393,38.86996206506943,100
-77.05788457660967,38.87253259892824,100
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
I always got two line of data for coordinates. But when they are in single line I got complete data. How to fetch complete data in multiple line?
Thanks in Advance.

The characters() method won't necessarily give you all the text data in one go (this is a very common misconception, btw).
The proper approach is to concatenate all the data returned by successive calls to characters() (using a StringBuilder or similar). Once your endElement() method is called, you can then treat that text buffer as complete and process it as such.
From the doc:
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks
Often you see that for a small XML doc one call to characters() will suffice. However as your XML doc increases in size, you'll find that due to buffering etc. you'll start getting multiple calls. Consequently each call treated on its own appears to be incomplete.

Related

Android: get HTML text from XML

I have implemented in my app reading a XML. It works fine. But I want to format the text. I've tried in the XML:
<monumento>
<horarios><b>L-V:</b> 10 a 20<br/>S-D: 11 a 15</horarios>
<tarifas>4000</tarifas>
</monumento>
But the only thing I get if I put HTML character is that the text does not display in my app.
I'll have many xml so that I will not always know where to place <b>, <br/>...
How I can do?
Main
StringBuilder builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getHorarios());
}
horario2.setText(builder.toString());
builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getTarifas());
}
tarifa2.setText(builder.toString());
XMLReader
public void get() {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(URL + monumento + ".xml").openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) {
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
currTag = true;
currTagVal = "";
if (localName.equals("monumento")) {
post = new HorariosTarifasObj();
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
currTag = false;
if(localName.equalsIgnoreCase("horarios")) {
post.setHorarios(currTagVal);
} else if(localName.equalsIgnoreCase("tarifas")) {
post.setTarifas(currTagVal);
} else if (localName.equalsIgnoreCase("monumento")) {
posts.add(post);
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currTag) {
currTagVal = currTagVal + new String(ch, start, length);
currTag = false;
}
}
Try CDATA:
<monumento>
<horarios><![CDATA[<b>L-V:</b> 10 a 20<br/>S-D: 11 a 15]]></horarios>
<tarifas>4000</tarifas>
</monumento>
In XML, < and > characters are reserved for XML tags. You will need to protect them by replacing them with special encoding characters.
You can use > for > and < for <
(edit) Eomm answer is right, CDATA does this as well, and more simple
Also, to use HTML coding in TextView, you will need to use Html.fromHtml() method
For instance :
tarifa2.setText(Html.fromHtml(builder.toString()));

My Parser Skips Elements

I'm trying to make an RSS reader that uses an XML from the web. For some reason it only reads the last element.
This is pretty much the XML file:
<rss version="2.0">
<channel>
<item>
<mainTitle>...</mainTitle>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
</item>
<item>
<mainTitle>...</mainTitle>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
<headline>
<title>...</title>
<description>...</description>
<subTitle>...</subTitle>
<link>...</link>
</headline>
</item>
</channel>
</rss>
This is the parser:
public class RssHandler extends DefaultHandler {
// Feed and Article objects to use for temporary storage
private Article currentArticle = new Article();
private List<Article> articleList = new ArrayList<Article>();
// Number of articles added so far
private int articlesAdded = 0;
// Number of articles to download
private static final int ARTICLES_LIMIT = 20;
// Current characters being accumulated
StringBuffer chars = new StringBuffer();
// Current characters being accumulated
int cap = new StringBuffer().capacity();
// Basic Booleans
private boolean wantedItem = false;
private boolean wantedHeadline = false;
private boolean wantedTitle = false;
public List<Article> getArticleList() {
return articleList;
}
public Article getParsedData() {
return this.currentArticle;
}
public RssHandler() {
this.currentArticle = new Article();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) {
chars = new StringBuffer();
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (localName.equalsIgnoreCase("title")) {
currentArticle.setTitle(chars.toString());
} else if (localName.equalsIgnoreCase("subtitle")) {
currentArticle.setDescription(chars.toString());
} else if (localName.equalsIgnoreCase("pubdate")) {
currentArticle.setPubDate(chars.toString());
} else if (localName.equalsIgnoreCase("guid")) {
currentArticle.setGuid(chars.toString());
} else if (localName.equalsIgnoreCase("author")) {
currentArticle.setAuthor(chars.toString());
} else if (localName.equalsIgnoreCase("link")) {
currentArticle.setEncodedContent(chars.toString());
} else if (localName.equalsIgnoreCase("item"))
// Check if looking for article, and if article is complete
if (localName.equalsIgnoreCase("item")) {
articleList.add(currentArticle);
currentArticle = new Article();
// Lets check if we've hit our limit on number of articles
articlesAdded++;
if (articlesAdded >= ARTICLES_LIMIT) {
throw new SAXException();
}
}
chars.setLength(0);
}
public void characters(char ch[], int start, int length) {
chars.append(ch, start, length);
}
}
Whenever I debug the application qName is never a direct child of Item.
It reads rss -> channel -> item -> title -> description ...
I'm clueless. Please help!
1) At the end of endElement() method, you are not resetting the chars length i.e
public void endElement(String uri, String localName, String qName)
throws SAXException {
//...
//Reset 'chars' length at the end always.
chars.setLength(0);
}
2) Change your characters(...) method like below:
public void characters(char ch[], int start, int length) {
chars.append(ch, start, length);
}
[EDIT
3) Move initialization of 'chars' from 'startElement' to constructor. i.e:
public RssHandler() {
this.currentArticle = new Article();
//Add here..
chars = new StringBuffer();
}
and,
public void startElement(String uri, String localName, String qName,
Attributes atts) {
//Remove below line..
//chars = new StringBuffer();
}
4) Finally, use qName to check matching tags instead of localName i.e
if (qName.equalsIgnoreCase("title")) {
currentArticle.setTitle(chars.toString().trim());
} else if (qName.equalsIgnoreCase("subtitle")) {
currentArticle.setDescription(chars.toString().trim());
} //...
EDIT]
More info # Using SAXParser in Android

SAXParser giving unexpected random results

I am developing a RSS feed reader for Android, and for parsing XML files, I am using SAX APIs. The problem is that while parsing the data, some of the text is truncated in a random fashion in some randomly selected tags (I mean different instances of same tag). To me more clear, I have added a screenshot.
Here is my Handler class:
public class RssParseHandler extends DefaultHandler {
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
//StringBuilder temp;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
//temp = new StringBuilder();
}
public List<RssItem> getItems() {
return rssItems;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("item".equals(qName)) {
currentItem = new RssItem();
} else if ("title".equals(qName)) {
parsingTitle = true;
} else if ("link".equals(qName)) {
parsingLink = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("item".equals(qName)) {
rssItems.add(currentItem);
currentItem = null;
} else if ("title".equals(qName)) {
//currentItem.setTitle(new String(temp));
//temp = new StringBuilder();
parsingTitle = false;
} else if ("link".equals(qName)) {
//currentItem.setLink(new String(temp));
//temp = new StringBuilder();
parsingLink = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null)
{
//temp.append(ch, start, length);
currentItem.setTitle(new String(ch, start, length));
}
} else if (parsingLink) {
if (currentItem != null) {
//temp.append(ch, start, length);
currentElement.setLink(new String(ch, start, length));
parsingLink = false;
}
}
}
}
The methods setTitle(String str) and setLink(String str) are setter methods of class RSSItem.
I googled this problem and read somewhere to use StringBuilder instead. Hence I tried by using StringBuilder. ( I have commented the code when I used StringBuilder). But then I started receiving NullPointerException.
Any suggestions to get rid of this problem ?
From the doc
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks; however, all of the
characters in any single event must come from the same external entity
so that the Locator provides useful information.
So propably you are getting a partial chunk of data. A possible solution could be:
if (currentItem != null) {
//temp.append(ch, start, length);
String tmpLink = currentElement.getLink();
tmpLink += new String(ch, start, length);
currentElement.setLink(tmpLink);
}
of course currentElement.getLink() should return an empty String and not a null reference.
Your problem is that you assume characters method will handle all characters inside the element, which is not true.
You should save and concatenate new characters with previous characters if any.
Using StringBuilder is good for your cause. You just need to handle the NPE you've got.

Quotes Issue when Reading from XML File in JAVA

I'm trying to read from XML and store the data in a text file.
My code works very well in reading and storing the data, EXCEPT when the paragraph from the XML file contains double quotes.
For example:
<Agent> "The famous spy" James Bond </Agent>
The output will ignore any data with quotes, and the result would be: James Bond
I'm using SAX, and here is part of my code that might have the issue:
public void characters(char[] ch, int start, int length) throws SAXException
{
tempVal = new String(ch, start, length);
}
I think I should replace the Quotes before storing the string in my tempVal.
Any ideas???
HERE is the complete code just in case:
public class Entailment {
private String Text;
private String Hypothesis;
private String ID;
private String Entailment;
}
//Event Handlers
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
//reset
tempVal = "";
if(qName.equalsIgnoreCase("pair")) {
//create a new instance of Entailment
tempEntailment = new Entailment();
tempEntailment.setID(attributes.getValue("id"));
tempEntailment.setEntailment(attributes.getValue("entailment"));
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch, start, length);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equalsIgnoreCase("pair")) {
//add it to the list
Entailments.add(tempEntailment);
}else if (qName.equalsIgnoreCase("t")) {
tempEntailment.setText(tempVal);
}else if (qName.equalsIgnoreCase("h")) {
tempEntailment.setHypothesis(tempVal);
}
}
public static void main(String[] args){
XMLtoTXT spe = new XMLtoTXT();
spe.runExample();
}
Your characters() method is being invoked multiple times because the parser is treating the input as several adjacent text nodes. The way your code is written (which you did not show) your are probably keeping only the last text node.
You need to accumulate the contents of adjacent text nodes yourself.
StringBuilder tempVal = null;
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
//reset
tempVal = new StringBuilder();
....
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal.append(ch, start, length);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
String textValue = tempVal.toString();
....
}
}
Interestingly enough I simulated your situation and my SAX parser works fine. I'm using jdk 1.6.0_20, and this is how I create my parser:
// Obtain a new instance of a SAXParserFactory.
SAXParserFactory factory = SAXParserFactory.newInstance();
// Specifies that the parser produced by this code will provide support for XML namespaces.
factory.setNamespaceAware(true);
// Specifies that the parser produced by this code will validate documents as they are parsed.
factory.setValidating(true);
// Creates a new instance of a SAXParser using the currently configured factory parameters.
saxParser = factory.newSAXParser();
My XML header is:
<?xml version="1.0" encoding="iso-8859-1"?>
What about you?

problem with using SAX XML Parser

I am using the SAX Parser for XML Parsing. The problem is for the following XML code:
<description>
Designer:Paul Smith Color:Plain Black Fabric/Composition:100% cotton Weave/Pattern:pinpoint Sleeve:Long-sleeved Fit:Classic Front style:Placket front Back style:Side pleat back Collar:Classic/straight collar Button:Pearlescent front button Pocket:rounded chest pocket Hem:Rounded hem
</description>
I get this:
Designer:Paul Smith
Color:Plain Black
The other parts are missing. The same thing happens for a few other lines. Can anyone kindly tell me whats the problem with my approach ?
My code is given below:
Parser code:
try {
/** Handling XML */
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
/** Send URL to parse XML Tags */
URL sourceUrl = new URL(
"http://50.19.125.224/Demo/VeryGoodSex_and_the_City_S6E6.xml");
/** Create handler to handle XML Tags ( extends DefaultHandler ) */
MyXMLHandler myXMLHandler = new MyXMLHandler();
xr.setContentHandler((ContentHandler) myXMLHandler);
xr.parse(new InputSource(sourceUrl.openStream()));
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
Object to hold XML parsed Info:
public class ParserObject {
String name=null;
String description=null;
String bitly=null; //single
String productLink=null;//single
String productPrice=null;//single
Vector<String> price=new Vector<String>();
}
Handler class:
public void endElement(String uri, String localName, String qName)
throws SAXException {
currentElement = false;
if (qName.equalsIgnoreCase("title"))
{
xmlDataObject[index].name=currentValue;
}
else if (qName.equalsIgnoreCase("artist"))
{
xmlDataObject[index].artist=currentValue;
}
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
currentElement = true;
if (qName.equalsIgnoreCase("allinfo"))
{
System.out.println("started");
}
else if (qName.equalsIgnoreCase("tags"))
{
insideTag=1;
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currentElement) {
currentValue = new String(ch, start, length);
currentElement = false;
}
}
You have to concatenate characters which the parser gives to you until it calls endElement.
Try removing currentElement = false; from characters handler, and
currentValue = currentValue + new String(ch, start, length);
Initialize currentValue with an empty string or handle null value in the expression above.
I think characters read some, but not all characters at the same time.
Thus, you only get the first "chunk".
Try printing each character chunk on a separate line, as debugging (before the if).

Categories

Resources