When i try to read the xml from java using SAX parser, it is unable to read the content in element present after special character
For ex:
<title>It's too difficult</title>
After reading using the SAX parser, it's displaying only It
How to handle special characters. My sample code is as below
package com.test.java;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLUsingSAXParser {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
int titleCount;
boolean title = false;
boolean description = false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
// System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("title")) {
title = true;
titleCount+=1;
}
if (qName.equalsIgnoreCase("description")) {
description = true;
}
}
public void endElement(String uri, String localName,
String qName)
throws SAXException {
// System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (title&&titleCount>2) {
System.out.println("title : "
+ new String(ch, start, length)+":"+titleCount);
title = false;
}
if (description) {
System.out.println("description : "
+ new String(ch, start, length));
description = false;
}
}
};
saxParser.parse("C:\\Documents and Settings\\sukumar\\Desktop\\sample.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
The characters(char ch[], int start, int length) methode does not read full lines, you should store the characters in a StringBuffer and use it in the endElemen method.
E.g.:
private StringBuffer buffer = new StringBuffer();
public void endElement(String uri, String localName,
String qName)
throws SAXException {
if (qName.equalsIgnoreCase("title")) {
System.out.println("title: " + buffer);
}else if (qName.equalsIgnoreCase("description")) {
System.out.println("description: " + buffer);
}
buffer = new StringBuffer();
}
public void characters(char ch[], int start, int length)
throws SAXException {
buffer.append(new String(ch, start, length));
}
Related
I am have below xml structure
<fs:AsReportedItem>
<fs:BookMark>/BODY[1]/DIV[3135]/DIV[0]/TABLE[0]/TBODY[0]/TR[32]/TD[5]/DIV[0]/FONT[0]/substr(1,2)
</fs:BookMark>
</fs:AsReportedItem>
I am parsing using SAX and reading tax value in the endElement() method
Here is my sample code
private void parseDocument() {
// parse
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(FileName, this);
} catch (ParserConfigurationException e) {
System.out.println("ParserConfig error");
} catch (SAXException e) {
System.out.println("SAXException : xml not well formed");
} catch (IOException e) {
System.out.println("IO error");
}
}
public void startElement(String s, String s1, String elementName, Attributes attributes) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.startFinancialStatementLineItemParser(OrgDataPartitonObj,financialStatementLineItemObj, elementName, attributes);
}
}
public void endElement(String s, String s1, String element) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.getEndElementFinancialStatementLineItemParser(financialStatementLineItemObj, element, tmpValue);
}
}
public static void getEndElementFinancialStatementLineItemParser(FinancialStatementLineItem financialStatementLineItemObj, String element, String tmpValue) {
if (element.equals("fs:BookMark")) {
financialStatementLineItemObj.setBookMark(tmpValue);
}
}
#Override
public void characters(char[] buffer, int start, int length) {
tmpValue = new String(buffer, start, length);
}
When i debug then i can see only this value /substr(1,2) all value with "/" is escaped
I dont know why i am not getting full value /BODY[1]/DIV[3135]/DIV[0]/TABLE[0]/TBODY[0]/TR[32]/TD[5]/DIV[0]/FONT[0]/substr(1,2)
If any escape character is used then where i have to use .
Here the sourcecode for a DefaultHandler which collects the text:
private static DefaultHandler getHandler() {
return new DefaultHandler() {
String text;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
if ("BookMark".equals(qName)) {
text = "";
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
text += new String(ch).trim();
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("BookMark".equals(qName)) {
System.out.println("endElement: " + text);
}
}
};
}
You need to change your characters()-method to
#Override
public void characters(char[] buffer, int start, int length) {
tmpValue += new String(buffer, start, length);
}
and you must reset tmpValue in the startElement()-method to "".
Just change character() method
#Override
public void characters(char[] buffer, int start, int length) {
tmpValue += new String(buffer, start, length);
}
And add this at last line in the endElement method .
public void endElement(String s, String s1, String element) throws SAXException {
if (OrgDataPartitonObj != null && "fs:FinancialStatementLineItemDataItem".equals(OrgDataPartitonObj.getType())) {
FinancialStatementLineItemParser.getEndElementFinancialStatementLineItemParser(financialStatementLineItemObj, element, tmpValue);
}
tmpValue="";
}
This question already has answers here:
SAX parsing and special characters
(2 answers)
Closed 5 years ago.
I have some issues with parsing xml files by sax.
The Java contenthandler code looks like this:
boolean rcontent = false;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("content")) {
rcontent = true;
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (rcontent){
System.out.println("content: " + new String(ch, start, length));
rcontent = false;
}
}
Xml file content is like this:
But the output is:
I want to say
which is not complete.
It's likely that characters(...) is being called multiple times for the single <content> block. Try something like
StringBuilder builder;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("content")) {
builder = new StringBuilder();
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (builder != null){
builder.append(new String(ch, start, length));
}
}
#Override
public void endElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (builder != null) {
System.out.println("Content = " + builder);
builder = null;
}
}
I have implemented in my app reading a XML. It works fine. But I want to format the text. I've tried in the XML:
<monumento>
<horarios><b>L-V:</b> 10 a 20<br/>S-D: 11 a 15</horarios>
<tarifas>4000</tarifas>
</monumento>
But the only thing I get if I put HTML character is that the text does not display in my app.
I'll have many xml so that I will not always know where to place <b>, <br/>...
How I can do?
Main
StringBuilder builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getHorarios());
}
horario2.setText(builder.toString());
builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getTarifas());
}
tarifa2.setText(builder.toString());
XMLReader
public void get() {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(URL + monumento + ".xml").openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) {
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
currTag = true;
currTagVal = "";
if (localName.equals("monumento")) {
post = new HorariosTarifasObj();
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
currTag = false;
if(localName.equalsIgnoreCase("horarios")) {
post.setHorarios(currTagVal);
} else if(localName.equalsIgnoreCase("tarifas")) {
post.setTarifas(currTagVal);
} else if (localName.equalsIgnoreCase("monumento")) {
posts.add(post);
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currTag) {
currTagVal = currTagVal + new String(ch, start, length);
currTag = false;
}
}
Try CDATA:
<monumento>
<horarios><![CDATA[<b>L-V:</b> 10 a 20<br/>S-D: 11 a 15]]></horarios>
<tarifas>4000</tarifas>
</monumento>
In XML, < and > characters are reserved for XML tags. You will need to protect them by replacing them with special encoding characters.
You can use > for > and < for <
(edit) Eomm answer is right, CDATA does this as well, and more simple
Also, to use HTML coding in TextView, you will need to use Html.fromHtml() method
For instance :
tarifa2.setText(Html.fromHtml(builder.toString()));
I am trying to parse the description tag in the xml but it only outputs one line:
description: <img src=http://www.ovations365.com/sites/ovations365.com/images/event/441705771/sparkswebsite_medium.jpg alt="SPARKS: Understanding Energy">
That is only a small part of the text in the CDATA and I'm trying to output the description for multiple items. Why can't I get the whole CDATA?
The XML is located: http://feeds.feedburner.com/Events-Ovations365
package com.example.ovations_proj;
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import com.example.ovations_proj.RssItem;
public class RssParseHandler extends DefaultHandler {
private List<RssItem> rssItems;
// Used to reference item while parsing
private RssItem currentItem;
// Parsing title indicator
private boolean parsingTitle;
// Parsing link indicator
private boolean parsingLink;
private boolean parsingDes;
public RssParseHandler() {
rssItems = new ArrayList<RssItem>();
}
public List<RssItem> getItems() {
return rssItems;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if ("item".equals(qName)) { //item
currentItem = new RssItem();
} else if ("title".equals(qName)) { //title
parsingTitle = true;
} else if ("link".equals(qName)) { //link
parsingLink = true;
}else if ("description".equals(qName) ) { //description
parsingDes = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element :" + qName);
if ("item".equals(qName)) {
rssItems.add(currentItem);//item
currentItem = null;
} else if ("title".equals(qName)) {//title
parsingTitle = false;
} else if ("link".equals(qName)) {//link
parsingLink = false;
} else if ("description".equals(qName) ) { //description
parsingDes = false;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (parsingTitle) {
if (currentItem != null){
currentItem.setTitle(new String(ch, start, length));
}
} else if (parsingLink) {
if (currentItem != null) {
currentItem.setLink(new String(ch, start, length));
parsingLink = false;
}
} else if (parsingDes) {
if (currentItem != null) {
currentItem.setDes(new String(ch, start, length));
System.out.println("description: " + currentItem.getDes());
parsingDes = false;
}
}
}
}
It seems that the character data in the <![CDATA[...]]> sections is being sent in multiple chunks, i.e. in multiple calls to the characters method.
The ContentHandler documentation for the characters method mentions that SAX parsers are free to do this:
SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks[....]
You'll therefore need to adjust your characters method to handle being called multiple times for the same chunk of contiguous character data.
I saw an example in mykong - http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
I tried to make it work for the xml file (below) by making the following modifications to the code in the above page -
1 - Have only two if blocks in startElement() and characters() methods.
2 - Change the print statements in above methods, ie
FIRSTNAME and First Name = passenger id
LASTNAME and Last Name = name
The problem is - In the output, I see the word passenger instead of the value of passenger id. How do i fix that ?
<?xml version="1.0" encoding="utf-8"?>
<root xmlns:android="www.google.com">
<passenger id="001">
<name>Tom Cruise</name>
</passenger>
<passenger id="002">
<name>Tom Hanks</name>
</passenger>
</root>
Java Code
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFileSAX{
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("passenger id")) {
bfname = true;
}
if (qName.equalsIgnoreCase("name")) {
blname = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("passenger id : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("name : " + new String(ch, start, length));
blname = false;
}
}
};
saxParser.parse("c:\\flight.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
In the startElement, when it's for "passenger", Attributes argument you get will have that value.
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase("passenger") && attributes != null){
System.out.println(attributes.getValue("id"));
}
}