RSS Parser returns 403 - java

I'm new to Java and we were given an assignment about XML Parsing. We have done DOM and now we are on SAX. That's why I'm using SAX Parser for parsing an rss feed. Its already working on files but when I try to parse an online rss feed, it returns an Error 403. I haven't tried parsing the same site on DOM because my laptop is so slow it takes me 5 minutes just to open a file.
Thanks for the help.
public class NewsHandler extends DefaultHandler {
private String url = "http://tomasinoweb.org/feed/rss";
private boolean inDescription = false;
private String[] descs = new String[11];
int i = 0;
public void processFeed() {
try {
SAXParserFactory factory =
SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(url).openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) { e.printStackTrace(); }
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if(qName.equals("description")) inDescription = true;
}
public void characters(char ch[], int start, int length) {
String chars = new String(ch).substring(start, start + length);
if(inDescription) descs[i] = chars;
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equals("description")) {
inDescription = false;
i++;
}
}
public String getDesc(int index) { return descs[index]; }
public static void main(String[] args) {
NewsHandler nh = new NewsHandler();
nh.processFeed();
for(int i=0; i<10; i++) {
System.out.println(nh.getDesc(i));
}
}
}

Solution:
Instead of using String url = "url", I used URL url = new URL("url") and URLConnection con = url.openConnection() and then con.addRequestProperty("user-agent", user-agent string);

Related

java parser sax doesn't get value & on my field

I have more elements on my xml file contains & and others characters html >.
I tested my code but it obtain the first part of my field for example:
SERIES & FILMS
It give only the word SERIES.
And other example:
C>SUDO
It give only C.
My code, my field name is "summary":
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
chars = new StringBuffer();
DefaultHandler handler = new DefaultHandler() {
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equals(SUMMARY2)) {
bfSummary = true;
}
if (qName.equals(SERVICE_DATA)) {
idServiceData = attributes.getValue("id");
bfServicedata = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName + ""
+ mListBaseLineByEpgId.size());
// maliste.put(listeId, summary);
malisteParThem.add(summary);
if (mListBaseLineByEpgId.get(idServiceData) != null) {
List<String> listeModif = mListBaseLineByEpgId
.get(idServiceData);
for (String chaine : malisteParThem) {
listeModif.add(chaine);
}
mListBaseLineByEpgId.replace(idServiceData, listeModif);
} else {
mListBaseLineByEpgId.put(idServiceData, malisteParThem);
}
malisteParThem = new ArrayList<String>();
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (bfSummary) {
summary = new String(ch, start, length);
summary = summary.replace(BEFORETILESUMMARY, "");
// chars.append(summary);
// summary=chars.toString();
summary = removeHtmlFrom(summary);
System.out.println("Summary : " + summary);
bfSummary = false;
}
if (bfServicedata) {
System.out.println("listeId : " + idServiceData);
bfServicedata = false;
}
}
};
File file = new File(cheminFichier);
InputStream inputStream = new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream);
InputSource is = new InputSource(reader);
//is.setEncoding("ISO-8859-1");
saxParser.parse(is, handler);
} catch (Exception e) {
e.printStackTrace();
}
Thank you.
Perhaps this problem is related to the unexpected behavior of SAX parser: it is allowed (per spec) to split the text part of an element and call characters() method multiple times for the same element.
What you need to do is have a StringBuffer or StringBuilder instance variable. You initialize it in startElement(), append to it on characters() and get the full text on endElement()
see this question for more info JAVA SAX parser split calls to characters()

Issue while fetching Imagesg from RSS feed

I'm trying to get the RSS feed into my android application, I am retrieving feeds like title, description and link of feed but not able to get image for particular feed.
The fallowing is my DefaultXmlHandler class. please go through and help me out.
public class XmlHandler extends DefaultHandler {
private RssFeedStructure feedStr = new RssFeedStructure();
private List<RssFeedStructure> rssList = new ArrayList<RssFeedStructure>();
private int articlesAdded = 0;
// Number of articles to download
private static final int ARTICLES_LIMIT = 15;
StringBuffer chars = new StringBuffer();
public void startElement(String uri, String localName, String qName, Attributes atts) {
chars = new StringBuffer();
if (qName.equalsIgnoreCase("media:content"))
{
if(!atts.getValue("url").toString().equalsIgnoreCase("null")){
feedStr.setImgLink(atts.getValue("url").toString());
}
else{
feedStr.setImgLink("");
}
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (localName.equalsIgnoreCase("title"))
{
feedStr.setTitle(chars.toString());
}
else if (localName.equalsIgnoreCase("description"))
{
feedStr.setDescription(chars.toString());
}
else if (localName.equalsIgnoreCase("pubDate"))
{
feedStr.setPubDate(chars.toString());
}
else if (localName.equalsIgnoreCase("encoded"))
{
feedStr.setEncodedContent(chars.toString());
}
else if (qName.equalsIgnoreCase("media:content"))
{
}
else if (localName.equalsIgnoreCase("link"))
{
try {
feedStr.setUrl(new URL(chars.toString()));
}catch (Exception e){}
}
if (localName.equalsIgnoreCase("item")) {
rssList.add(feedStr);
feedStr = new RssFeedStructure();
articlesAdded++;
if (articlesAdded >= ARTICLES_LIMIT)
{
throw new SAXException();
}
}
}
public void characters(char ch[], int start, int length) {
chars.append(new String(ch, start, length));
}
public List<RssFeedStructure> getLatestArticles(String feedUrl) {
URL url = null;
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
url = new URL(feedUrl);
xr.setContentHandler(this);
xr.parse(new InputSource(url.openStream()));
} catch (IOException e) {
} catch (SAXException e) {
} catch (ParserConfigurationException e) {
}
return rssList;
}
}

Android: get HTML text from XML

I have implemented in my app reading a XML. It works fine. But I want to format the text. I've tried in the XML:
<monumento>
<horarios><b>L-V:</b> 10 a 20<br/>S-D: 11 a 15</horarios>
<tarifas>4000</tarifas>
</monumento>
But the only thing I get if I put HTML character is that the text does not display in my app.
I'll have many xml so that I will not always know where to place <b>, <br/>...
How I can do?
Main
StringBuilder builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getHorarios());
}
horario2.setText(builder.toString());
builder = new StringBuilder();
for (HorariosTarifasObj post : helper.posts) {
builder.append(post.getTarifas());
}
tarifa2.setText(builder.toString());
XMLReader
public void get() {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(URL + monumento + ".xml").openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) {
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
currTag = true;
currTagVal = "";
if (localName.equals("monumento")) {
post = new HorariosTarifasObj();
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
currTag = false;
if(localName.equalsIgnoreCase("horarios")) {
post.setHorarios(currTagVal);
} else if(localName.equalsIgnoreCase("tarifas")) {
post.setTarifas(currTagVal);
} else if (localName.equalsIgnoreCase("monumento")) {
posts.add(post);
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currTag) {
currTagVal = currTagVal + new String(ch, start, length);
currTag = false;
}
}
Try CDATA:
<monumento>
<horarios><![CDATA[<b>L-V:</b> 10 a 20<br/>S-D: 11 a 15]]></horarios>
<tarifas>4000</tarifas>
</monumento>
In XML, < and > characters are reserved for XML tags. You will need to protect them by replacing them with special encoding characters.
You can use > for > and < for <
(edit) Eomm answer is right, CDATA does this as well, and more simple
Also, to use HTML coding in TextView, you will need to use Html.fromHtml() method
For instance :
tarifa2.setText(Html.fromHtml(builder.toString()));

BlackBerry parsing UTF-8 XML File with SAX Parser

I am trying to parse a UTF-8 xml file using SAX parser and i used the parser but it results an exception it's message "Expecting an element"
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<config>
<filepath>/mnt/sdcard/Audio_Recorder/anonymous22242.3gp</filepath>
<filename>anonymous22242.3gp</filename>
<annotation>
<file>anonymous22242.3gp</file>
<timestamp>0:06</timestamp>
<note>test1</note>
</annotation>
<annotation>
<file>anonymous22242.3gp</file>
<timestamp>0:09</timestamp>
<note>لول</note>
</annotation>
<annotation>
<file>anonymous22242.3gp</file>
<timestamp>0:09</timestamp>
<note>لولو</note>
</annotation>
</config>
private static String fileDirectory;
private final static ArrayList<String> allFileNames = new ArrayList<String>();
private final static ArrayList<String[]> allAnnotations = new ArrayList<String[]>();
private static String[] currentAnnotation = new String[3];
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser playbackParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean audioFullPath = false;
boolean audioName = false;
boolean annotationFile = false;
boolean annotationTimestamp = false;
boolean annotationNote = false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("filepath")) {
audioFullPath = true;
}
if (qName.equalsIgnoreCase("filename")) {
audioName = true;
}
if (qName.equalsIgnoreCase("file")) {
annotationFile = true;
}
if (qName.equalsIgnoreCase("timestamp")) {
annotationTimestamp = true;
}
if (qName.equalsIgnoreCase("note")) {
annotationNote = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (audioFullPath) {
String filePath = new String(ch, start, length);
System.out.println("Full Path : " + filePath);
fileDirectory = filePath;
audioFullPath = false;
}
if (audioName) {
String fileName = new String(ch, start, length);
System.out.println("File Name : " + fileName);
allFileNames.add(fileName);
audioName = false;
}
if (annotationFile) {
String fileName = new String(ch, start, length);
currentAnnotation[0] = fileName;
annotationFile = false;
}
if (annotationTimestamp) {
String timestamp = new String(ch, start, length);
currentAnnotation[1] = timestamp;
annotationTimestamp = false;
}
if (annotationNote) {
String note = new String(ch, start, length);
currentAnnotation[2] = note;
annotationNote = false;
allAnnotations.add(currentAnnotation);
}
}
};
InputStream inputStream = getStream("http://www.example.com/example.xml");
Reader xmlReader = new InputStreamReader(inputStream, "UTF-8");
InputSource xmlSource = new InputSource(xmlReader);
xmlSource.setEncoding("UTF-8");
playbackParser.parse(xmlSource, handler);
System.out.println(fileDirectory);
System.out.println(allFileNames);
System.out.println(allAnnotations);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public Static InputStream getStream(String url)
{
try
{
connection = getConnection(url);
connection.setRequestProperty("User-Agent",System.getProperty("microedition.profiles"));
connection.setRequestProperty("Connection", "Keep-Alive");
connection.setRequestProperty("Content-Type", "text/xml; charset=UTF-8");
inputStream = connection.openInputStream();
return inputStream;
}
catch(Exception e)
{
System.out.println("NNNNNNN "+e.getMessage());
return null;
}
}
public HttpConnection getConnection(String url)
{
try
{
connection = (HttpConnection) Connector.open(url+getConnectionString());
}
catch(Exception e)
{
}
return connection;
}
but when i pass to the parse method the inputStream instead of inputSource it parses the file but still have a problem with Arabic characters between
playbackParser.parse(inputStream, handler);
The XML you showed has unencoded Arabic characters in it. That is in violation of the XML's declared Encoding, which means the XML is malformed. A SAX parser processes data piece by piece sequentially, triggering events for each piece. It will not detect such an encoding error until it reaches the piece that contains those erroneous characters. There is nothing you can do about that. The XML needs to be fixed by its original author.

parse XML and convert to a Collection

<inputs>
<MAT_NO>123</MAT_NO>
<MAT_NO>323</MAT_NO>
<MAT_NO>4223</MAT_NO>
<FOO_BAR>122</FOO_BAR>
<FOO_BAR>125</FOO_BAR>
</inputs>
I've to parse the above the XML. After parsing, i want the values to be in a Map<String, List<String>> with Key values corresponding to the child nodes - MAT_NO, FOO_BAR
and values - the values of the child nodes -123, 323 etc.
Following is my shot. Is there any better way of doing this??
public class UserInputsXmlParser extends DefaultHandler {
private final SaveSubscriptionValues subscriptionValues = null;
private String nodeValue = "";
private final String inputKey = "";
private final List<String> valuesList = null;
private Map<String, List<String>> userInputs;
public Map<String, List<String>> parse(final String strXML) {
try {
final SAXParserFactory parserFactory = SAXParserFactory
.newInstance();
final SAXParser saxParser = parserFactory.newSAXParser();
saxParser.parse(new InputSource(new StringReader(strXML)), this);
return userInputs;
} catch (final SAXException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final IOException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final ParserConfigurationException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final Exception e) {
e.printStackTrace();
throw new MyException("", e);
}
}
#Override
public void startElement(final String uri, final String localName,
final String qName, final Attributes attributes)
throws SAXException {
nodeValue = "";
if ("inputs".equalsIgnoreCase(qName)) {
userInputs = MyUtil.getNewHashMap();
return;
}
}
#Override
public void characters(final char[] ch, final int start, final int length)
throws SAXException {
if (!MyUtil.isEmpty(nodeValue)) {
nodeValue += new String(ch, start, length);
} else {
nodeValue = new String(ch, start, length);
}
}
#Override
public void endElement(final String uri, final String localName,
final String qName) throws SAXException {
if (!"inputs".equalsIgnoreCase(qName)) {
storeUserInputs(qName, nodeValue);
}
}
/**
* #param qName
* #param nodeValue2
*/
private void storeUserInputs(final String qName, final String nodeValue2) {
if (nodeValue2 == null || nodeValue2.trim().equals("")) { return; }
final String trimmedValue = nodeValue2.trim();
final List<String> values = userInputs.get(qName);
if (values != null) {
values.add(trimmedValue);
} else {
final List<String> valueList = new ArrayList<String>();
valueList.add(trimmedValue);
userInputs.put(qName, valueList);
}
}
public static void main(final String[] args) {
final String sample = "<inputs>" + "<MAT_NO>154400-0000</MAT_NO>"
+ "<MAT_NO> </MAT_NO>" + "<MAT_NO>154400-0002</MAT_NO>"
+ "<PAT_NO>123</PAT_NO><PAT_NO>1111</PAT_NO></inputs>";
System.out.println(new UserInputsXmlParser().parse(sample));
}
}
UPDATE: The children of <inputs> nodes are dynamic. I'll be knowing just the root node.
Do you have to provide a solution as part of a SAX event handler? If not then you could use one of the many XML libraries around, such as dom4j. Make the solution a lot simpler;
public static void main(String[] args) throws Exception
{
String sample = "<inputs>" + "<MAT_NO>154400-0000</MAT_NO>"
+ "<MAT_NO> </MAT_NO>" + "<MAT_NO>154400-0002</MAT_NO>"
+ "<PAT_NO>123</PAT_NO><PAT_NO>1111</PAT_NO></inputs>";
System.out.println(parse(sample));
}
static Map<String,List<String>> parse(String xml) throws Exception
{
Map<String,List<String>> map = new HashMap<String,List<String>>();
SAXReader reader = new SAXReader();
Document doc = reader.read(new StringReader(xml));
for (Iterator i = doc.getRootElement().elements().iterator(); i.hasNext();)
{
Element element = (Element)i.next();
//Maybe handle elements with only whitespace text content
List<String> list = map.get(element.getName());
if (list == null)
{
list = new ArrayList<String>();
map.put(element.getName(), list);
}
list.add(element.getText());
}
return map;
}
I would check xstream....( http://x-stream.github.io/tutorial.html )
XStream is a simple library to serialize objects to XML and back again.
For something this basic, look into xpath.

Categories

Resources