Parse value containing special character "/" gives wrong output using SAX parser

Parse value containing special character "/" gives wrong output using SAX parser - java

I am have below xml structure
<fs:AsReportedItem>
<fs:BookMark>/BODY[1]/DIV[3135]/DIV[0]/TABLE[0]/TBODY[0]/TR[32]/TD[5]/DIV[0]/FONT[0]/substr(1,2)
</fs:BookMark>
</fs:AsReportedItem>
I am parsing using SAX and reading tax value in the endElement() method
Here is my sample code
private void parseDocument() {
// parse
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(FileName, this);
} catch (ParserConfigurationException e) {
System.out.println("ParserConfig error");
} catch (SAXException e) {
System.out.println("SAXException : xml not well formed");
} catch (IOException e) {
System.out.println("IO error");
}
}
public void startElement(String s, String s1, String elementName, Attributes attributes) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.startFinancialStatementLineItemParser(OrgDataPartitonObj,financialStatementLineItemObj, elementName, attributes);
}
}
public void endElement(String s, String s1, String element) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.getEndElementFinancialStatementLineItemParser(financialStatementLineItemObj, element, tmpValue);
}
}
public static void getEndElementFinancialStatementLineItemParser(FinancialStatementLineItem financialStatementLineItemObj, String element, String tmpValue) {
if (element.equals("fs:BookMark")) {
financialStatementLineItemObj.setBookMark(tmpValue);
}
}
#Override
public void characters(char[] buffer, int start, int length) {
tmpValue = new String(buffer, start, length);
}
When i debug then i can see only this value /substr(1,2) all value with "/" is escaped
I dont know why i am not getting full value /BODY[1]/DIV[3135]/DIV[0]/TABLE[0]/TBODY[0]/TR[32]/TD[5]/DIV[0]/FONT[0]/substr(1,2)
If any escape character is used then where i have to use .

Here the sourcecode for a DefaultHandler which collects the text:
private static DefaultHandler getHandler() {
return new DefaultHandler() {
String text;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
if ("BookMark".equals(qName)) {
text = "";
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
text += new String(ch).trim();
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("BookMark".equals(qName)) {
System.out.println("endElement: " + text);
}
}
};
}

You need to change your characters()-method to
#Override
public void characters(char[] buffer, int start, int length) {
tmpValue += new String(buffer, start, length);
}
and you must reset tmpValue in the startElement()-method to "".

Just change character() method
#Override
public void characters(char[] buffer, int start, int length) {
tmpValue += new String(buffer, start, length);
}
And add this at last line in the endElement method .
public void endElement(String s, String s1, String element) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.getEndElementFinancialStatementLineItemParser(financialStatementLineItemObj, element, tmpValue);
}
tmpValue="";
}

Related

Java use sax to parse xml files. Can't get the correct content when coming up &amp [duplicate]

This question already has answers here:
SAX parsing and special characters
(2 answers)
Closed 5 years ago.
I have some issues with parsing xml files by sax.
The Java contenthandler code looks like this:
boolean rcontent = false;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("content")) {
rcontent = true;
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (rcontent){
System.out.println("content: " + new String(ch, start, length));
rcontent = false;
}
}
Xml file content is like this:
But the output is:
I want to say
which is not complete.

It's likely that characters(...) is being called multiple times for the single <content> block. Try something like
StringBuilder builder;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("content")) {
builder = new StringBuilder();
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (builder != null){
builder.append(new String(ch, start, length));
}
}
#Override
public void endElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (builder != null) {
System.out.println("Content = " + builder);
builder = null;
}
}

RSS Reader using Sax Parser losing characters from title

I'm trying to use a SAX parser in order to return the contents of an RSS feed from a URL - http://pitchfork.com/rss/news/, but often characters are lost in displaying the title, showing partial text or just a closing tag ">"
How can i modify my handler class to prevent this? I think I should probably use StringBuilder or StringBuffer, but i'm not sure how to implement it.
ParseHandler.java
public class RssParseHandler extends DefaultHandler {
//Parsed items
private List<RssItem> rssItems;
private RssItem currentItem;
private boolean parsingTitle;
private boolean parsingLink;
private boolean parsing_id;
private boolean parsingDescription;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
}
public List<RssItem> getItems() {
return rssItems;
}
//Creates empty RssItem object during the process of an item start tag
//Indicators are set to true when particular tag is being processed
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if ("item".equals(qName)) {
currentItem = new RssItem();
} else if ("title".equals(qName)) {
parsingTitle = true;
} else if ("link".equals(qName)) {
parsingLink = true;
} else if ("_id".equals(qName)) {
parsing_id = true;
} else if ("description".equals(qName)) {
parsingDescription = true;
}
}
//Current RssItem is added to the list following process of end tag
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("item".equals(qName)) {
rssItems.add(currentItem);
currentItem = null;
} else if ("title".equals(qName)) {
parsingTitle = false;
} else if ("link".equals(qName)) {
parsingLink = false;
} else if ("_id".equals(qName)) {
parsing_id = false;
} else if ("description".equals(qName)) {
parsingDescription = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null)
currentItem.setTitle(new String(ch, start, length));
} else if (parsingLink) {
if (currentItem != null) {
currentItem.setLink(new String(ch, start, length));
parsingLink = false;
}
} else if (parsing_id) {
if (currentItem != null) {
currentItem.set_id(new String(ch, start, length));
parsing_id = false;
}
} else if (parsingDescription) {
if (currentItem != null) {
currentItem.setDescription(new String(ch, start, length));
parsingDescription = false;
}
}
}}//rssHandlerClass

Use a StringBuilder to build the tag, rather than using a new String instance as the documentation says:
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
And #CommonWares says this exactly in his post Here.
Build your tag as it is found using StringBuilder, since there is chunks coming in at once rather than the entire string (This explains the incomplete tags!). You may or may not need the isBuilding flag, but I don't know your entire implementation so I added it incase.
StringBuilder mSb;
boolean isBuilding;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
mSb = new StringBuilder();
isBuilding = true;
if(qName.equals("title")){
parsingTitle = true;
}
...
...
}
#Override
public void characters (char ch[], int start, int length) {
if (mSb !=null && isBuilding) {
for (int i=start; i<start+length; i++) {
mSb.append(ch[i]);
}
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(parsingTitle){
currentItem.setTitle(sb.toString().trim());
parsingTitle = false;
isBuilding = false;
}
}

Android: get HTML text from XML

I have implemented in my app reading a XML. It works fine. But I want to format the text. I've tried in the XML:
<monumento>
<horarios><b>L-V:</b> 10 a 20<br/>S-D: 11 a 15</horarios>
<tarifas>4000</tarifas>
</monumento>
But the only thing I get if I put HTML character is that the text does not display in my app.
I'll have many xml so that I will not always know where to place <b>, <br/>...
How I can do?
Main
StringBuilder builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getHorarios());
}
horario2.setText(builder.toString());
builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getTarifas());
}
tarifa2.setText(builder.toString());
XMLReader
public void get() {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(URL + monumento + ".xml").openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) {
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
currTag = true;
currTagVal = "";
if (localName.equals("monumento")) {
post = new HorariosTarifasObj();
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
currTag = false;
if(localName.equalsIgnoreCase("horarios")) {
post.setHorarios(currTagVal);
} else if(localName.equalsIgnoreCase("tarifas")) {
post.setTarifas(currTagVal);
} else if (localName.equalsIgnoreCase("monumento")) {
posts.add(post);
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currTag) {
currTagVal = currTagVal + new String(ch, start, length);
currTag = false;
}
}

Try CDATA:
<monumento>
<horarios><![CDATA[<b>L-V:</b> 10 a 20<br/>S-D: 11 a 15]]></horarios>
<tarifas>4000</tarifas>
</monumento>

In XML, < and > characters are reserved for XML tags. You will need to protect them by replacing them with special encoding characters.
You can use > for > and < for <
(edit) Eomm answer is right, CDATA does this as well, and more simple
Also, to use HTML coding in TextView, you will need to use Html.fromHtml() method
For instance :
tarifa2.setText(Html.fromHtml(builder.toString()));

Parsing xml special chars issue

I'm parsing an XML got from webservice using SAX.
One of the fields is a link, like the following
<link_site>
http://www.ownhosting.com/webservice_332.asp?id_user=21395&id_parent=33943
</link_site>
I have to get this link and save it, but it is saved like so: id_parent=33943.
Parser snippet:
//inside method startElement
else if(localName.equals("link_site")){
this.in_link=true;
}
...
//inside method endElement
else if(localName.equals("link_site"){
this.in_link=false;
}
Then, I get the content
else if(this.in_link){
xmlparsing.setOrderLink(count, Html.fromHtml(new String(ch, start, length)).toString());
}//I get it and put in a HashMap<Integer,String>
I know that this issue is due to the special characters encoding.
What can I do?

& makes parser to split the line and make several calls to characters() method. You need to concatinate the chunks. Something like this
SAXParserFactory.newInstance().newSAXParser()
.parse(new File("1.xml"), new DefaultHandler() {
String url;
String element;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
element = qName;
url = "";
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (element.equals("link_site")) {
url += new String(ch, start, length);
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (element.equals("link_site")) {
System.out.println(url.trim());
element = "";
}
}
});
prints
http://www.ownhosting.com/webservice_332.asp?id_user=21395&id_parent=33943

Unable to read special characters from xml using java

When i try to read the xml from java using SAX parser, it is unable to read the content in element present after special character
For ex:
<title>It's too difficult</title>
After reading using the SAX parser, it's displaying only It
How to handle special characters. My sample code is as below
package com.test.java;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLUsingSAXParser {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
int titleCount;
boolean title = false;
boolean description = false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
// System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("title")) {
title = true;
titleCount+=1;
}
if (qName.equalsIgnoreCase("description")) {
description = true;
}
}
public void endElement(String uri, String localName,
String qName)
throws SAXException {
// System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (title&&titleCount>2) {
System.out.println("title : "
+ new String(ch, start, length)+":"+titleCount);
title = false;
}
if (description) {
System.out.println("description : "
+ new String(ch, start, length));
description = false;
}
}
};
saxParser.parse("C:\\Documents and Settings\\sukumar\\Desktop\\sample.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}

The characters(char ch[], int start, int length) methode does not read full lines, you should store the characters in a StringBuffer and use it in the endElemen method.
E.g.:
private StringBuffer buffer = new StringBuffer();
public void endElement(String uri, String localName,
String qName)
throws SAXException {
if (qName.equalsIgnoreCase("title")) {
System.out.println("title: " + buffer);
}else if (qName.equalsIgnoreCase("description")) {
System.out.println("description: " + buffer);
}
buffer = new StringBuffer();
}
public void characters(char ch[], int start, int length)
throws SAXException {
buffer.append(new String(ch, start, length));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse value containing special character "/" gives wrong output using SAX parser - java

You need to change your characters()-method to #Override public void characters(char[] buffer, int start, int length) { tmpValue += new String(buffer, start, length); } and you must reset tmpValue in the startElement()-method to "".

Related

Java use sax to parse xml files. Can't get the correct content when coming up &amp [duplicate]

RSS Reader using Sax Parser losing characters from title

Android: get HTML text from XML

Parsing xml special chars issue

Unable to read special characters from xml using java

Categories

Resources