Locating Specific Attributes in Digester - Java - java

I'm using the Apache Commons Digester and trying to locate a particular tag in the structure to include in the object.
<parent>
<image size="small">some url</image>
<image size="medium">some url</image>
<image size="large">some url</image>
<image size="huge">some url</image>
</parent>
I really only want the medium image to be included in my partent object but I'm not sure how I would do that.
Right now I'm using digester.addBeanPropertySetter(PathToParent+"/image","image"); but this gets updated for every image tag (as it should).
Ideally I would like something like digester.addBeanPropertySetter(PathToParent+"/image/medium","image"); but you can't do that.

I omitted generic getters/setters.
public class Parent {
private Image image;
public void setImage(Image image) {
if ("medium".equals(image.getSize())) {
this.image = image;
}
}
}
public class Image {
private String size;
private String url;
}
public static void main(String[] args) throws IOException, SAXException {
String s = "<parent>"
+ "<image size='small'>some url1</image>"
+ "<image size='medium'>some url2</image>"
+ "<image size='large'>some url3</image>"
+ "<image size='huge'>some url4</image>"
+ "</parent>";
Digester digester = new Digester();
digester.addObjectCreate("parent", Parent.class);
digester.addFactoryCreate("parent/image", new ImageCreationFactory());
digester.addBeanPropertySetter("parent/image", "url");
digester.addSetNext("parent/image", "setImage");
Parent p = (Parent) digester.parse(new StringReader(s));
}
public class ImageCreationFactory implements ObjectCreationFactory {
public Object createObject(Attributes attributes) throws Exception {
Image i = new Image();
i.setSize(attributes.getValue("size"));
return i;
}
}

I actually figured this out using the xmlpullparser - here is the code to get the image attribute "large" only and ignore the rest - it's the last "if" in the case statement.
public class XmlPullFeedParser extends BaseFeedParser {
public XmlPullFeedParser(String feedUrl) {
super(feedUrl);
}
public ArrayList<Message> parse() {
ArrayList<Message> messages = null;
XmlPullParser parser = Xml.newPullParser();
try {
// auto-detect the encoding from the stream
parser.setInput(this.getInputStream(), null);
int eventType = parser.getEventType();
Message currentMessage = null;
boolean done = false;
while (eventType != XmlPullParser.END_DOCUMENT && !done){
String name = null;
String attrib = null;
switch (eventType){
case XmlPullParser.START_DOCUMENT:
messages = new ArrayList<Message>();
break;
case XmlPullParser.START_TAG:
name = parser.getName();
attrib = parser.getAttributeValue(0);
if (name.equalsIgnoreCase(EVENT)){
currentMessage = new Message();
} else if (currentMessage != null){
if (name.equalsIgnoreCase(WEBSITE)){
currentMessage.setWebsite(parser.nextText());
} else if (name.equalsIgnoreCase(DESCRIPTION)){
currentMessage.setDescription(parser.nextText());
} else if (name.equalsIgnoreCase(START_DATE)){
currentMessage.setDate(parser.nextText());
} else if (name.equalsIgnoreCase(TITLE)){
currentMessage.setTitle(parser.nextText());
} else if (name.equalsIgnoreCase(HEADLINER)){
currentMessage.setHeadliner(parser.nextText());
} else if ((name.equalsIgnoreCase(IMAGE)) && (attrib.equalsIgnoreCase("large"))) {
currentMessage.setImage(parser.nextText());
}
}
break;
case XmlPullParser.END_TAG:
name = parser.getName();
if (name.equalsIgnoreCase(EVENT) && currentMessage != null){
messages.add(currentMessage);
} else if (name.equalsIgnoreCase(EVENTS)){
done = true;
}
break;
}
eventType = parser.next();
}
} catch (Exception e) {
Log.e("AndroidNews::PullFeedParser", e.getMessage(), e);
throw new RuntimeException(e);
}
return messages;
}
}

I do not think that it is possible. You have to write your own code to perform this kind of filtering.
But it is very simple. If you wish to create clean code write class named ImageAccessor with method getImage(String size). This method will get the data from digester and compare it with predefined size string (or pattern).

Related

How to parse xml data in ArrayList [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I need to parse xml data in android. I have seen this project: here from github which teaches how to parse xml data in listbox. However, I want to get xml data to different strings. Although, I have used pretty much the same code as in the github project but I only get error and the app stops responding.
Code:
public class Main extends Fragment {
android.view.View myview;
EditText number;
#Override
public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
myview = inflater.inflate(R.layout.fragment_screen, container, false);
number = (EditText) myview.findViewById(R.id.number);
XmlParser par = new XmlParser();
number.setText(par.getStackSitesFromFile(getActivity().getBaseContext())
.get(0).getLink()); **Error here**
return myview;
}
}
XmlParser.java
public class XmlParser {
static final String KEY_SITE = "rate";
static final String KEY_NAME = "Name";
static final String KEY_LINK = "Rate";
static final String KEY_ABOUT = "Date";
static final String KEY_IMAGE_URL = "Time";
public static List<HandleXML> getStackSitesFromFile(Context ctx) {
// List of StackSites that we will return
List<HandleXML> stackSites;
stackSites = new ArrayList<HandleXML>();
// temp holder for current StackSite while parsing
HandleXML curStackSite = null;
// temp holder for current text value while parsing
String curText = "";
try {
// Get our factory and PullParser
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
XmlPullParser xpp = factory.newPullParser();
// Open up InputStream and Reader of our file.
FileInputStream fis = ctx.openFileInput("/sdcard/rates.xml");
BufferedReader reader = new BufferedReader(new InputStreamReader(fis));
// point the parser to our file.
xpp.setInput(reader);
// get initial eventType
int eventType = xpp.getEventType();
// Loop through pull events until we reach END_DOCUMENT
while (eventType != XmlPullParser.END_DOCUMENT) {
// Get the current tag
String tagname = xpp.getName();
// React to different event types appropriately
switch (eventType) {
case XmlPullParser.START_TAG:
if (tagname.equals("test")) {
curStackSite = new HandleXML();
}
break;
case XmlPullParser.TEXT:
//grab the current text so we can use it in END_TAG event
curText = xpp.getText();
break;
case XmlPullParser.END_TAG:
if (tagname.equalsIgnoreCase("test")) {
stackSites.add(curStackSite);
} else if (tagname.equalsIgnoreCase(KEY_NAME)) {
curStackSite.setName(curText);
} else if (tagname.equals("Rate")) {
curStackSite.setLink(curText);
} else if (tagname.equalsIgnoreCase(KEY_ABOUT)) {
curStackSite.setAbout(curText);
} else if (tagname.equalsIgnoreCase(KEY_IMAGE_URL)) {
curStackSite.setImgUrl(curText);
}
break;
default:
break;
}
eventType = xpp.next();
}
} catch (Exception e) {
e.printStackTrace();
}
// return the populated list.
return stackSites;
}
}
And finally, HandleXml.java
public class HandleXML {
private String name;
private String rate;
private String date;
private String time;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getLink() {
return rate;
}
public void setLink(String rate) {
this.rate = rate;
}
public String getAbout() {
return date;
}
public void setAbout(String date) {
this.date = date;
}
public String getImgUrl() {
return time;
}
public void setImgUrl(String time) {
this.time = time;
}
#Override
public String toString() {
return name + rate;
}
}
Xml File:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2016-09-07T05:50:08Z" yahoo:lang="en-US">
<results>
<test>
<Name>EUR/USD</Name>
<Rate>1.1251</Rate>
<Date>9/7/2016</Date>
<Time>0:56am</Time>
</test>
<test>
<Name>EUR/USD</Name>
<Rate>1.1253</Rate>
<Date>9/7/2016</Date>
<Time>0:56am</Time>
</test>
</results>
</query>
The error is in the first fragment of code in: number.setText(par.getStackSitesFromFile(getActivity().getBaseContext()).get(0).getLink());
The arraylist returns empty because FileInputStream fis is set to open a file which contains path separator (/) , and it causes
java.lang.IllegalArgumentException: File /sdcard/rates.xml contains a path separator
. You have to use
FileInputStream fis = new FileInputStream (new File("/sdcard/rates.xml")); instead. Do not forget to close fis by fis.close();
Final code for XmlParser will be:
public class XmlParser {
static final String KEY_SITE = "rate";
static final String KEY_NAME = "Name";
static final String KEY_LINK = "Rate";
static final String KEY_ABOUT = "Date";
static final String KEY_IMAGE_URL = "Time";
FileInputStream fis;
public static List<HandleXML> getStackSitesFromFile() {
// List of StackSites that we will return
List<HandleXML> stackSites;
stackSites = new ArrayList<HandleXML>();
// temp holder for current StackSite while parsing
HandleXML curStackSite = null;
// temp holder for current text value while parsing
String curText = "";
try {
// Get our factory and PullParser
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
XmlPullParser xpp = factory.newPullParser();
// Open up InputStream and Reader of our file.
fis = new FileInputStream(new File("/sdcard/rates.xml"));
BufferedReader reader = new BufferedReader(new InputStreamReader(fis));
// point the parser to our file.
xpp.setInput(reader);
// get initial eventType
int eventType = xpp.getEventType();
// Loop through pull events until we reach END_DOCUMENT
while (eventType != XmlPullParser.END_DOCUMENT) {
// Get the current tag
String tagname = xpp.getName();
// React to different event types appropriately
switch (eventType) {
case XmlPullParser.START_TAG:
if (tagname.equals("test")) {
curStackSite = new HandleXML();
}
break;
case XmlPullParser.TEXT:
//grab the current text so we can use it in END_TAG event
curText = xpp.getText();
break;
case XmlPullParser.END_TAG:
if (tagname.equalsIgnoreCase("test")) {
stackSites.add(curStackSite);
} else if (tagname.equalsIgnoreCase(KEY_NAME)) {
curStackSite.setName(curText);
} else if (tagname.equals("Rate")) {
curStackSite.setLink(curText);
} else if (tagname.equalsIgnoreCase(KEY_ABOUT)) {
curStackSite.setAbout(curText);
} else if (tagname.equalsIgnoreCase(KEY_IMAGE_URL)) {
curStackSite.setImgUrl(curText);
}
break;
default:
break;
}
eventType = xpp.next();
}
} catch (Exception e) {
e.printStackTrace();
}
try {
fis.close();
}catch(Exception e){
Log.i("Problem closing", "Closing fis");
}
// return the populated list.
return stackSites;
}
}
And then set number.setText() like that: number.setText(par.getStackSitesFromFile().get(0).getLink());

java.lang.IndexOutOfBoundsException: Invalid index 0, size is 0.. parse file incorrect?

I am trying to make an android application that parses pictures from online in a grid/list however Im coming up with some runtime errors.. its saying that i am parsing wrong for my FAMILY DOG BREED. Does anyone know where I am making my errors?? I know why an array would be out of bounds but i have no idea how to fix it!!
I am trying to parse http://www.dogbreedslist.info/family-dog-breeds/ this website data.. but am getting runtime errors at these sections of my
DogActivity.class
private class RetrieveDogsTask extends AsyncTask<String, Void, Void> {
#Override
protected Void doInBackground(String... urls) {
for (String url : urls) {
Parser parser = new Parser(url, DogsActivity.this);
Breed.Name breedName = breed.getName();
if (breedName == Breed.Name.HERDING_DOG_BREED) {
dogs.add(parser.parseProfile(new Dog(url, breedName)));
} else {
dogs.addAll(parser.parseDogsPage(breedName, DogsActivity.this));
}
}
return null;
}
Parser.class
public class Parser {
Document doc;
Context context;
Elements dogRows;
public Parser(String url, Context context) {
this.context = context;
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
Log.e("Page", "Wrong URL or network problems", e);
}
}
public ArrayList<Dog> parseDogsPage(Breed.Name breedName, Context context) {
ArrayList<Dog> dogs = new ArrayList<>();
try {
Element dogContainer;
if (breedName == Breed.Name.FAMILY_DOG_BREED) {
dogContainer = doc.getElementsByClass("familybreed").get(0);
} else {
dogContainer = doc.getElementsByClass("toybreed").get(0);
}
Log.i("Page", "A page has been parsed successfully");
dogRows = dogContainer.getElementsByTag("a");
for (Element dogRow : dogRows) {
String dogName, dogURL;
Dog dog;
dogURL = dogRow.getElementsByTag("a").get(0).absUrl("href");
String dogThumbnailURL = dogRow.
getElementsByTag("img").get(0).absUrl("src");
if (breedName == Breed.Name.FAMILY_DOG_BREED) {
dogName = dogRow.getElementsByTag("span").get(0).text();
dog = new Dog(dogName, dogURL, dogThumbnailURL, breedName);
} else {
dogName = dogRow.getElementsByTag("strong").get(0).text();
Element details = dogContainer.getElementsByClass("details").get(0);
Elements children = details.children();
if (breedName == Breed.Name.TOY_DOG_BREED || breedName == Breed.Name.HOUND_DOG_BREED) {
String origin = children.get(1).text();
String lifespan = children.get(3).text();
dog= new Dog(dogName, origin , lifespan, dogURL, dogThumbnailURL, breedName);
} else {
//for herding
String sizetype = children.get(1).text();
dog = new Dog(dogName, sizetype, dogThumbnailURL, dogURL, breedName);
}
}
dogs.add(dog);
}
} catch (Exception e) {
Log.e("Breed activity", "Wrong parsing for " + breedName, e);
}
return dogs;
}
public Dog parseProfile(Dog dog) {
if (!dog.isDetailDataReady()) {
//coaches already read the data in the coaches page
try {
Element dogContainer = doc.getElementById("dogscontainer");
Element bioContainer = dogContainer.getElementById("biocontainer");
Element bioDetails = bioContainer.getElementById("biodetails");
dog.setOtherNames(bioDetails.getElementsByTag("h1").text());
ArrayList<Dog.Detail> dogDetails = new ArrayList<>();
Elements rows = bioDetails.getElementsByTag("tr");
for (Element row : rows) {
Elements tds = row.getElementsByTag("td");
if (dog.getBreed() == Breed.Name.WORKING_DOG_BREED ||
dog.getBreed() == Breed.Name.TERRIER_DOG_BREED ||
dog.getBreed() == Breed.Name.HERDING_DOG_BREED) {
//coaches, manager and legends use th and td
Elements ths = row.getElementsByTag("th");
dogDetails.add(new Dog.Detail(ths.get(0).text(), tds.get(0).text()));
} else {
//dogs use two tds
dogDetails.add(new Dog.Detail(tds.get(0).text(), tds.get(1).text()));
}
}
dog.setDetails(dogDetails);
Element articleText = dogContainer.getElementsByClass("dogarticletext").get(0);
Elements paragraphs = articleText.getElementsByTag("p");
String text = "";
for (Element p : paragraphs) {
text = text + "\n\n\n" + p.text();
}
dog.setArticleText(dog.getArticleText() + text);
if (dog.getBreed() == Breed.Name.WORKING_DOG_BREED ||
dog.getBreed() == Breed.Name.TERRIER_DOG_BREED ||
dog.getBreed() == Breed.Name.HERDING_DOG_BREED) {
//get main image url
dog.setMainImageURL(bioContainer.getElementsByTag("img").get(0).absUrl("src"));
if (dog.getBreed() == Breed.Name.WORKING_DOG_BREED) {
dog.setThumbnailURL(dog.getMainImageURL());
//only need first name
dog.setName(dog.getOtherNames().split(" ")[1]);
}
} else {
dog.setMainImageURL(bioContainer.getElementsByClass("mainImage").get(0).absUrl("src"));
}
} catch (Exception e) {
Log.e("Profile activity", "Wrong parsing for " + dog.getUrl(), e);
}
if (dog.getBreed() == Breed.Name.WORKING_DOG_BREED) {
dog.setBasicDataReady(true);
}
dog.setDetailDataReady(true);
}
return dog;
}
}
RetrieveDogTask:
private class RetrieveDogsTask extends AsyncTask<String, Void, Void> {
#Override
protected Void doInBackground(String... urls) {
for (String url : urls) {
Parser parser = new Parser(url, DogsActivity.this);
Breed.Name breedName = breed.getName();
if (breedName == Breed.Name.HERDING_DOG_BREED) {
dogs.add(parser.parseProfile(new Dog(url, breedName)));
} else {
dogs.addAll(parser.parseDogsPage(breedName, DogsActivity.this));
}
}
return null;
Logcat:
Wrong parsing for FAMILY_DOG_BREED
java.lang.IndexOutOfBoundsException: Invalid index 0, size is 0
at java.util.ArrayList.throwIndexOutOfBoundsException(ArrayList.java:255)
at java.util.ArrayList.get(ArrayList.java:308)
at org.jsoup.select.Elements.get(Elements.java:544)
at com.example.shannon.popular.Parser.parseDogsPage(Parser.java:35)
at com.example.shannon.popular.DogsActivity$RetrieveDogsTask.doInBackground(DogsActivity.java:140)
at com.example.shannon.popular.DogsActivity$RetrieveDogsTask.doInBackground(DogsActivity.java:131)
at android.os.AsyncTask$2.call(AsyncTask.java:288)
at java.util.concurrent.FutureTask.run(FutureTask.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587)
at java.lang.Thread.run(Thread.java:818)
Breed.class:
public class Breed implements Serializable {
private Name name;
private String url;
Breed(Name name, String url) {
this.name = name;
this.url = url;
}
public Name getName() {
return name;
}
public String getNameString(Context context) {
String nameString = "";
switch (name) {
case FAMILY_DOG_BREED:
nameString = context.getString(R.string.family_breed);
break;
case TOY_DOG_BREED:
nameString = context.getString(R.string.toy_breed);
break;
case HOUND_DOG_BREED:
nameString = context.getString(R.string.hound_breed);
break;
case TERRIER_DOG_BREED:
nameString = context.getString(R.string.terrier_breed);
break;
case WORKING_DOG_BREED:
nameString = context.getString(R.string.working_breed);
break;
case HERDING_DOG_BREED:
nameString = context.getString(R.string.herding_breed);
break;
}
return nameString;
}
public String getURL() {
return url;
}
public enum Name {FAMILY_DOG_BREED, TOY_DOG_BREED, HOUND_DOG_BREED, TERRIER_DOG_BREED, WORKING_DOG_BREED, HERDING_DOG_BREED}
}
You may be using a strict XML parser for a malformed HTML document. I just tried to XML-validate the URL you are parsing and it's failing because the <link> element is never closed (in strict XML, it should be ended by a </link> tag, but it's missing in that page).
This is very common for HTML pages as today's browsers tend to auto-correct these kinds of errors.
Since you use a strict XML parser, it is very likely for the parser to fail.
I suggest switching to different parser. I'd use a PULL parser (eg. http://www.xmlpull.org ) - this technique allows parsing with a lower-level of control, meaning you can easily ignore unwanted content from the HTML - like these link elements, or any others.
So you could do something like this:
XmlPullParser parser = XmlPullParserFactory.newInstance().newPullParser();
parser.setInput(new BufferedReader(
new InputStreamReader(
new URL("http://.....").openConnection().getInputStream()
)
)
);
while(XmlPullParser.END_DOCUMENT != parser.next()){
if(XmlPullParser.START_TAG == parser.getEventType()){
String tagName = parser.getName();
if(parser.getAttributeCount() > 0 {
// parse attributes, if needed
}
if(parser.nextToken() == XmlPullParser.TEXT){
String tagValue = parser.getText()
}
// etc.
}
}

Java inner class new instance not being created

I have a Java class that is going to have a number of inner classes. This is done for organization and to keep things in a separate file.
public class PUCObjects
{
public static class PUCNewsItem
{
public String title;
public String summary;
public String body;
public String url;
public String imageUrl;
}
}
I am then trying to create a new instance of that inner class (doing this in another class that parses some remote XML), but for some reason it doesn't seem to get created:
public static ArrayList<PUCObjects.PUCNewsItem> getPUCNews() throws IOException {
String url = "http://api.puc.edu/news/list?key="+API_KEY+"&count=30";
InputStream is = downloadUrl(url);
XmlPullParserFactory pullParserFactory;
try {
pullParserFactory = XmlPullParserFactory.newInstance();
XmlPullParser parser = pullParserFactory.newPullParser();
parser.setInput(is, null);
ArrayList<PUCObjects.PUCNewsItem> items = null;
int eventType = parser.getEventType();
PUCObjects.PUCNewsItem item = null;
Log.d("Debug: ", "Start: "+url);
while (eventType != XmlPullParser.END_DOCUMENT){
String name = null;
switch (eventType){
case XmlPullParser.START_DOCUMENT:
items = new ArrayList<PUCObjects.PUCNewsItem>();
break;
case XmlPullParser.START_TAG:
name = parser.getName();
//Log.d("Start Tag Name: ", parser.getName()+" === "+name);
if (name == "item"){
Log.d("Debug: ", "Item");
item = new PUCObjects.PUCNewsItem();
} else if (item != null){
Log.d("Debug: ", "Item is not NULL 2");
if (name == "title"){
Log.d("Title: ", parser.nextText());
item.title = parser.nextText();
} else if (name == "summary"){
item.summary = parser.nextText();
} else if (name == "body_text"){
item.body = parser.nextText();
}
}
break;
case XmlPullParser.END_TAG:
name = parser.getName();
if (name.equalsIgnoreCase("item") && item != null) {
Log.d("Debug: ", "ADD ITEM");
items.add(item);
}
break;
}//end switch
eventType = parser.next();
}//end while
Log.d("Debug: ", "Done");
return items;
} catch (XmlPullParserException e) {
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}//end
I am trying to create the object like item = new PUCObjects.PUCNewsItem(); but it seems to always be null.
Is there a reason why this is object isn't getting created?
Problem is String comparison. Your if statement is not resulting to true due to == check.
if (name == "item"){
You need to use equals() method instead of == when comparing Objects/Strings. Read this thread for more information on eqauals() vs ==

XML SAXParser reformat if else Java/Android

I have the following problem:
I am using an XML SAXParser to parse an xml file and create dynamicly classes and set their properties.
I have written code that works now to make 4 classes and set the properiets of the classes but the problem is that the code is one big conditional case (if/else if/else) and that it is very difficult to read.
I would like to parse the xml so I can create 15 different classes, so the code is getting very big.
Now the exact question is how to refactor the if/elseif/else to better readable code? I've searched around for a while now and found some methods like using a map or the command pattern but I don't understand how to use this?
This is the code I'm currently using and that is working:
public class XmlParserSax extends DefaultHandler {
List<Fragment> fragments = null;
String atType = null;
String typeObject;
String currentelement = null;
String atColor = null;
RouteFragment route = null;
ChapterFragment chapter = null;
FirstFragment first = null;
ExecuteFragment execute = null;
StringBuilder textBuilder;
public XmlParserSax() {
fragments = new ArrayList<Fragment>();
try {
/**
* Create a new instance of the SAX parser
**/
SAXParserFactory saxPF = SAXParserFactory.newInstance();
SAXParser sp = saxPF.newSAXParser();
XMLReader xr = sp.getXMLReader();
/**
* Create the Handler to handle each of the XML tags.
**/
String file = "assets/test.xml";
InputStream in = this.getClass().getClassLoader()
.getResourceAsStream(file);
xr.setContentHandler(this);
xr.parse(new InputSource(in));
} catch (Exception e) {
System.out.println(e);
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
atColor = attributes.getValue("color");
atType = attributes.getValue("type");
currentelement = localName;
textBuilder = new StringBuilder();
if (localName.equalsIgnoreCase("template")) {
if (atType.equalsIgnoreCase("route")) {
route = new RouteFragment();
typeObject = "route";
} else if (atType.equalsIgnoreCase("chapter")) {
chapter = new ChapterFragment();
typeObject = "chapter";
} else if (atType.equalsIgnoreCase("first")) {
first = new FirstFragment();
typeObject = "first";
} else if (atType.equalsIgnoreCase("execute")) {
execute = new ExecuteFragment();
typeObject = "execute";
}
} else if (localName.equalsIgnoreCase("number")) {
if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setNumberTextcolor("#" + atColor);
}
} else if (localName.equalsIgnoreCase("maxnumber")) {
if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setMaxNumberColor("#" + atColor);
}
} else if (localName.equalsIgnoreCase("title")) {
if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setTitleColor("#" + atColor);
} else if (typeObject.equalsIgnoreCase("first")) {
first.setTitleColor("#" + atColor);
}
} else if (localName.equalsIgnoreCase("subtitle")) {
if (typeObject.equalsIgnoreCase("first")) {
first.setSubtitleColor("#" + atColor);
}
} else if (localName.equalsIgnoreCase("text")) {
if (typeObject.equalsIgnoreCase("execute")) {
execute.setTextColor("#" + atColor);
}
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
String text = textBuilder.toString();
if (localName.equalsIgnoreCase("template")) {
if (typeObject.equalsIgnoreCase("route")) {
fragments.add(route); // nieuw routefragment
// toevoegen aan de lijst
} else if (typeObject.equalsIgnoreCase("chapter")) {
fragments.add(chapter); // nieuw chapterfragment
// toevoegen aan de lijst
} else if (typeObject.equalsIgnoreCase("first")) {
fragments.add(first);
} else if (typeObject.equalsIgnoreCase("execute")) {
fragments.add(execute);
}
} else if (localName.equalsIgnoreCase("text")) {
if (typeObject.equalsIgnoreCase("route")) {
// route.setOmschrijving(text);
} else if (typeObject.equalsIgnoreCase("execute")) {
execute.setText(text);
}
} else if (localName.equalsIgnoreCase("background")) {
if (typeObject.equalsIgnoreCase("route")) {
// route.setKleur("#" + text);
} else if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setBackgroundColor("#" + text);
} else if (typeObject.equalsIgnoreCase("first")) {
first.setBackgroundColor("#" + text);
} else if (typeObject.equalsIgnoreCase("execute")) {
execute.setBackgroundColor("#" + text);
}
} else if (localName.equalsIgnoreCase("number")) {
if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setNumber(text);
}
} else if (localName.equalsIgnoreCase("maxnumber")) {
if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setMaxNumber(text);
}
} else if (localName.equalsIgnoreCase("title")) {
if (typeObject.equalsIgnoreCase("chapter")) {
chapter.setTitle(text);
} else if (typeObject.equalsIgnoreCase("first")) {
first.setTitle(text);
}
} else if (localName.equalsIgnoreCase("subtitle")) {
if (typeObject.equalsIgnoreCase("first")) {
first.setSubtitle(text);
}
} else if (localName.equalsIgnoreCase("square")) {
if (typeObject.equalsIgnoreCase("execute")) {
execute.setBorderColor("#" + text);
}
}
}
public List<Fragment> getList() {
return fragments;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
textBuilder.append(ch, start, length);
}
}
There is another way of doing this; using startElementListener and EndTextElementListeners
First define your root element:
RootElement root = new RootElement("root");
Define your child elements
Element nodeA = root.getChild("nodeA");
Element nodeB = root.getChild("nodeB");
Element nodeC = root.getChild("nodeC");
Now set the listeners
root.setStartElementListener(new StartElementListener() {
public void start(Attributes attributes) {
foundElement = true;// tells you that you are parsing the intended xml
}
});
nodeA.setEndTextElementListener(new EndTextElementListener() {
public void end(String body) {
//populate your pojo
}
});
This way you can do away with all those if-else statements and booleans, but you have to live with the N number of listeners.

How do you pull XHTML out of an ATOM feed using Java?

I am trying to pull some XHTML out of an RSS feed so I can place it in a WebView. The RSS feed in question has a tag called <content> and the characters inside the content are XHTML. (The site I'm paring is a blogger feed)
What is the best way to try to pull this content? The < characters are confusing my parser. I have tried both DOM and SAX but neither can handle this very well.
Here is a sample of the XML as requested. In this case, I want basically XHTML inside the content tag to be a string. <content> XHTML </content>
Edit: based on ignyhere's suggestion I have tried XPath, but I am still having the same issue. Here is a pastebin sample of my tests.
It's not pretty, but this is (the essence of) what I use to parse an ATOM feed from Blogger using XmlPullParser. The code is pretty icky, but it is from a real app. You can probably get the general flavor of it, anyway.
final String TAG_FEED = "feed";
public int parseXml(Reader reader) {
XmlPullParserFactory factory = null;
StringBuilder out = new StringBuilder();
int entries = 0;
try {
factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(true);
XmlPullParser xpp = factory.newPullParser();
xpp.setInput(reader);
while (true) {
int eventType = xpp.next();
if (eventType == XmlPullParser.END_DOCUMENT) {
break;
} else if (eventType == XmlPullParser.START_DOCUMENT) {
out.append("Start document\n");
} else if (eventType == XmlPullParser.START_TAG) {
String tag = xpp.getName();
// out.append("Start tag " + tag + "\n");
if (TAG_FEED.equalsIgnoreCase(tag)) {
entries = parseFeed(xpp);
}
} else if (eventType == XmlPullParser.END_TAG) {
// out.append("End tag " + xpp.getName() + "\n");
} else if (eventType == XmlPullParser.TEXT) {
// out.append("Text " + xpp.getText() + "\n");
}
}
out.append("End document\n");
} catch (XmlPullParserException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
// return out.toString();
return entries;
}
private int parseFeed(XmlPullParser xpp) throws XmlPullParserException, IOException {
int depth = xpp.getDepth();
assert (depth == 1);
int eventType;
int entries = 0;
xpp.require(XmlPullParser.START_TAG, null, TAG_FEED);
while (((eventType = xpp.next()) != XmlPullParser.END_DOCUMENT) && (xpp.getDepth() > depth)) {
// loop invariant: At this point, the parser is not sitting on
// end-of-document, and is at a level deeper than where it started.
if (eventType == XmlPullParser.START_TAG) {
String tag = xpp.getName();
// Log.d("parseFeed", "Start tag: " + tag); // Uncomment to debug
if (FeedEntry.TAG_ENTRY.equalsIgnoreCase(tag)) {
FeedEntry feedEntry = new FeedEntry(xpp);
feedEntry.persist(this);
entries++;
// Log.d("FeedEntry", feedEntry.title); // Uncomment to debug
// xpp.require(XmlPullParser.END_TAG, null, tag);
}
}
}
assert (depth == 1);
return entries;
}
class FeedEntry {
String id;
String published;
String updated;
// Timestamp lastRead;
String title;
String subtitle;
String authorName;
int contentType;
String content;
String preview;
String origLink;
String thumbnailUri;
// Media media;
static final String TAG_ENTRY = "entry";
static final String TAG_ENTRY_ID = "id";
static final String TAG_TITLE = "title";
static final String TAG_SUBTITLE = "subtitle";
static final String TAG_UPDATED = "updated";
static final String TAG_PUBLISHED = "published";
static final String TAG_AUTHOR = "author";
static final String TAG_CONTENT = "content";
static final String TAG_TYPE = "type";
static final String TAG_ORIG_LINK = "origLink";
static final String TAG_THUMBNAIL = "thumbnail";
static final String ATTRIBUTE_URL = "url";
/**
* Create a FeedEntry by pulling its bits out of an XML Pull Parser. Side effect: Advances
* XmlPullParser.
*
* #param xpp
*/
public FeedEntry(XmlPullParser xpp) {
int eventType;
int depth = xpp.getDepth();
assert (depth == 2);
try {
xpp.require(XmlPullParser.START_TAG, null, TAG_ENTRY);
while (((eventType = xpp.next()) != XmlPullParser.END_DOCUMENT)
&& (xpp.getDepth() > depth)) {
if (eventType == XmlPullParser.START_TAG) {
String tag = xpp.getName();
if (TAG_ENTRY_ID.equalsIgnoreCase(tag)) {
id = Util.XmlPullTag(xpp, TAG_ENTRY_ID);
} else if (TAG_TITLE.equalsIgnoreCase(tag)) {
title = Util.XmlPullTag(xpp, TAG_TITLE);
} else if (TAG_SUBTITLE.equalsIgnoreCase(tag)) {
subtitle = Util.XmlPullTag(xpp, TAG_SUBTITLE);
} else if (TAG_UPDATED.equalsIgnoreCase(tag)) {
updated = Util.XmlPullTag(xpp, TAG_UPDATED);
} else if (TAG_PUBLISHED.equalsIgnoreCase(tag)) {
published = Util.XmlPullTag(xpp, TAG_PUBLISHED);
} else if (TAG_CONTENT.equalsIgnoreCase(tag)) {
int attributeCount = xpp.getAttributeCount();
for (int i = 0; i < attributeCount; i++) {
String attributeName = xpp.getAttributeName(i);
if (attributeName.equalsIgnoreCase(TAG_TYPE)) {
String attributeValue = xpp.getAttributeValue(i);
if (attributeValue
.equalsIgnoreCase(FeedReaderContract.FeedEntry.ATTRIBUTE_NAME_HTML)) {
contentType = FeedReaderContract.FeedEntry.CONTENT_TYPE_HTML;
} else if (attributeValue
.equalsIgnoreCase(FeedReaderContract.FeedEntry.ATTRIBUTE_NAME_XHTML)) {
contentType = FeedReaderContract.FeedEntry.CONTENT_TYPE_XHTML;
} else {
contentType = FeedReaderContract.FeedEntry.CONTENT_TYPE_TEXT;
}
break;
}
}
content = Util.XmlPullTag(xpp, TAG_CONTENT);
extractPreview();
} else if (TAG_AUTHOR.equalsIgnoreCase(tag)) {
// Skip author for now -- it is complicated
int authorDepth = xpp.getDepth();
assert (authorDepth == 3);
xpp.require(XmlPullParser.START_TAG, null, TAG_AUTHOR);
while (((eventType = xpp.next()) != XmlPullParser.END_DOCUMENT)
&& (xpp.getDepth() > authorDepth)) {
}
assert (xpp.getDepth() == 3);
xpp.require(XmlPullParser.END_TAG, null, TAG_AUTHOR);
} else if (TAG_ORIG_LINK.equalsIgnoreCase(tag)) {
origLink = Util.XmlPullTag(xpp, TAG_ORIG_LINK);
} else if (TAG_THUMBNAIL.equalsIgnoreCase(tag)) {
thumbnailUri = Util.XmlPullAttribute(xpp, tag, null, ATTRIBUTE_URL);
} else {
#SuppressWarnings("unused")
String throwAway = Util.XmlPullTag(xpp, tag);
}
}
} // while
} catch (XmlPullParserException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
assert (xpp.getDepth() == 2);
}
}
public static String XmlPullTag(XmlPullParser xpp, String tag)
throws XmlPullParserException, IOException {
xpp.require(XmlPullParser.START_TAG, null, tag);
String itemText = xpp.nextText();
if (xpp.getEventType() != XmlPullParser.END_TAG) {
xpp.nextTag();
}
xpp.require(XmlPullParser.END_TAG, null, tag);
return itemText;
}
public static String XmlPullAttribute(XmlPullParser xpp,
String tag, String namespace, String name)
throws XmlPullParserException, IOException {
assert (!TextUtils.isEmpty(tag));
assert (!TextUtils.isEmpty(name));
xpp.require(XmlPullParser.START_TAG, null, tag);
String itemText = xpp.getAttributeValue(namespace, name);
if (xpp.getEventType() != XmlPullParser.END_TAG) {
xpp.nextTag();
}
xpp.require(XmlPullParser.END_TAG, null, tag);
return itemText;
}
I'll give you a hint: None of the return values matter. The data is saved into a database by a method (not shown) called at this line:
feedEntry.persist(this);
I would attempt to attack it with XPath. Would something like this work?
public static String parseAtom (InputStream atomIS)
throws Exception {
// Below should yield the second content block
String xpathString = "(//*[starts-with(name(),"content")])[2]";
// or, String xpathString = "//*[name() = 'content'][2]";
// remove the '[2]' to get all content tags or get the count,
// if needed, and then target specific blocks
//String xpathString = "count(//*[starts-with(name(),"content")])";
// note the evaluate expression below returns a glob and not a node set
XPathFactory xpf = XPathFactory.newInstance ();
XPath xpath = xpf.newXPath ();
XPathExpression xpathCompiled = xpath.compile (xpathString);
// use the first to recast and evaluate as NodeList
//Object atomOut = xpathCompiled.evaluate (
// new InputSource (atomIS), XPathConstants.NODESET);
String atomOut = xpathCompiled.evaluate (
new InputSource (atomIS), XPathConstants.STRING);
System.out.println (atomOut);
return atomOut;
}
I can see your problem here, the reason why these parsers are not producing the correct result is because contents of your <content> tag are not wrapped into <![CDATA[ ]]>, what I would do until I find more adequate solution I'd use quick and dirty trick :
private void parseFile(String fileName) throws IOException {
String line;
BufferedReader br = new BufferedReader(new FileReader(new File(fileName)));
StringBuilder sb = new StringBuilder();
boolean match = false;
while ((line = br.readLine()) != null) {
if(line.contains("<content")){
sb.append(line);
sb.append("\n");
match = true;
continue;
}
if(match){
sb.append(line);
sb.append("\n");
match = false;
}
if(line.contains("</content")){
sb.append(line);
sb.append("\n");
}
}
System.out.println(sb.toString());
}
This will give you all content in String. You can optionaly seperate them by slightly modyfiying this method or if you don't need actual <content> you can filter that out as well.

Categories

Resources