Here, I'm using SAX method for parsing array. I'm facing an issue where I'm not able to write a generic code to parse an array type of xml. I couldn't find a solution for generic way methodology to identify it as an array and iterate over it and store it in List
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Any solution will help. Thanks in advance
I use code below. I got it from: https://github.com/niteshapte/generic-xml-parser
public class GenericXMLParserSAX extends DefaultHandler {
private ListMultimap<String, String> listMultimap = ArrayListMultimap.create();
String tempCharacter;
private String[] startElements;
private String[] endElements;
public void setStartElements(String[] startElements) {
this.startElements = startElements;
}
public String[] getStartElements() {
return startElements;
}
public void setEndElements(String[] endElements) {
this.endElements = endElements;
}
public String[] getEndElements() {
return endElements;
}
public void parseDocument(String xml, String[] startElements, String[] endElements) {
setStartElements(startElements);
setEndElements(endElements);
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
sp.parse(inputStream, this);
} catch(SAXException se) {
se.printStackTrace();
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch (IOException ie) {
ie.printStackTrace();
}
}
public void parseDocument(String xml, String[] endElements) {
setEndElements(endElements);
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
sp.parse(inputStream, this);
} catch(SAXException se) {
se.printStackTrace();
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch (IOException ie) {
ie.printStackTrace();
}
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
String[] startElements = getStartElements();
if(startElements!= null){
for(int i = 0; i < startElements.length; i++) {
if(qName.startsWith(startElements[i])) {
listMultimap.put(startElements[i], qName);
}
}
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
tempCharacter = new String(ch, start, length);
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
String[] endElements = getEndElements();
for(int i = 0; i < endElements.length; i++) {
if (qName.equalsIgnoreCase(endElements[i])) {
listMultimap.put(endElements[i], tempCharacter);
}
}
}
public ListMultimap<String, String> multiSetResult() {
return listMultimap;
}
}
You can create a custom Handler that extends DefaultHandler and use it to parse your XML and generate the List<Book> for you.
The Handler will maintain a List<Book> and:
every time it will encounter the book start tag, it will create a new Book
every time it will encounter the book end tag, it will add this Book to the List.
In the end it will be holding the complete list of Books and you can access it via its getBooks() method
Assuming this Book class:
class Book {
private String category;
private String title;
private String author;
private String year;
private String price;
// GETTERS/SETTERS
}
You can create a custom Handler like this:
class MyHandler extends DefaultHandler {
private boolean title = false;
private boolean author = false;
private boolean year = false;
private boolean price = false;
// Holds the list of Books
private List<Book> books = new ArrayList<>();
// Holds the Book we are currently parsing
private Book book;
public void startElement(String uri, String localName,String qName, Attributes attributes) {
switch (qName) {
// Create a new Book when finding the start book tag
case "book": {
book = new Book();
book.setCategory(attributes.getValue("category"));
}
case "title": title = true;
case "author": author = true;
case "year": year = true;
case "price": price = true;
}
}
public void endElement(String uri, String localName, String qName) {
// Add the current Book to the list when finding the end book tag
if("book".equals(qName)) {
books.add(book);
}
}
public void characters(char[] ch, int start, int length) {
String value = new String(ch, start, length);
if (title) {
book.setTitle(value);
title = false;
} else if (author) {
book.setAuthor(value);
author = false;
} else if (year) {
book.setYear(value);
year = false;
} else if (price) {
book.setPrice(value);
price = false;
}
}
public List<Book> getBooks() {
return books;
}
}
Then you parse using this custom Handler and retrieve the list of Books
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
MyHandler myHandler = new MyHandler();
saxParser.parse("/path/to/file.xml", myHandler);
List<Book> books = myHandler.getBooks();
Related
Previously, I was able to display the data of one tag, but this time not several values are displayed, but only one.
This my parser code:
public class Runner {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler handler = new MyHandler();
xmlReader.setContentHandler(handler);
xmlReader.parse("src/countries.xml");
Countries branches = handler.getBranches();
try (FileWriter files = new FileWriter("src/diploma/SAX.txt")) {
files.write("Item " + "\n" + String.valueOf(branches.itemList) + "\n");
}
}
private static class MyHandler extends DefaultHandler{
static final String HISTORY_TAG = "history";
static final String ITEM_TAG = "item";
static final String NAME_ATTRIBUTE = "name";
public Countries branches;
public Item currentItem;
private String currencyElement;
Countries getBranches(){
return branches;
}
public void startDocument() throws SAXException {
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
currencyElement = qName;
switch (currencyElement) {
case HISTORY_TAG: {
branches.itemList = new ArrayList<>();
currentItem = new Item();
currentItem.setHistoryName(String.valueOf(attributes.getValue(NAME_ATTRIBUTE)));
} break;
default: {}
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String text = new String(ch, start, length);
if (text.contains("<") || currencyElement == null){
return;
}
switch (currencyElement) {
case ITEM_TAG: {
currentItem.setItem(text);
} break;
default: { }
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException{
switch (qName) {
case HISTORY_TAG: {
branches.itemList.add(currentItem);
currentItem = null;
} break;
default: {
}
}
currencyElement = null;
}
public void endDocument() throws SAXException {
System.out.println("SAX parsing is completed...");
}
}
}
Class Item:
public class Item {
private String historyName;
private String item;
public String getItem() {
return item;
}
public void setItem(String item) {
this.item = item;
}
public String getHistoryName() {
return historyName;
}
public void setHistoryName(String historyName) {
this.historyName = historyName;
}
#Override
public String toString() {
return
"historyName = " + historyName + ", " + "\n" + "item = " + item + ", ";
}
}
And class Countries
public class Countries {
public List<Item> itemList;
}
I have problems with this part
<history name="История">
<item>
История белорусских земель очень богата и самобытна.
</item>
<item>
Эту страну постоянно раздирали внутренние конфликты и противоречия, много раз она была втянута в войны.
</item>
<item>
В 1945 году Беларусь вступила в состав членов-основателей Организации Объединенных Наций.
</item>
</history>
I only display the last "item" tag, and other duplicate tags are displayed only in the singular. I can't figure out where the error is, but I noticed that in "endElement" all values are displayed, but as one element. Maybe someone knows what's the matter?
You are creating a new ArrayList every time you encounter the item tag.
That is why you only see one item displayed after parsing.
I'm trying to get the RSS feed into my android application, I am retrieving feeds like title, description and link of feed but not able to get image for particular feed.
The fallowing is my DefaultXmlHandler class. please go through and help me out.
public class XmlHandler extends DefaultHandler {
private RssFeedStructure feedStr = new RssFeedStructure();
private List<RssFeedStructure> rssList = new ArrayList<RssFeedStructure>();
private int articlesAdded = 0;
// Number of articles to download
private static final int ARTICLES_LIMIT = 15;
StringBuffer chars = new StringBuffer();
public void startElement(String uri, String localName, String qName, Attributes atts) {
chars = new StringBuffer();
if (qName.equalsIgnoreCase("media:content"))
{
if(!atts.getValue("url").toString().equalsIgnoreCase("null")){
feedStr.setImgLink(atts.getValue("url").toString());
}
else{
feedStr.setImgLink("");
}
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (localName.equalsIgnoreCase("title"))
{
feedStr.setTitle(chars.toString());
}
else if (localName.equalsIgnoreCase("description"))
{
feedStr.setDescription(chars.toString());
}
else if (localName.equalsIgnoreCase("pubDate"))
{
feedStr.setPubDate(chars.toString());
}
else if (localName.equalsIgnoreCase("encoded"))
{
feedStr.setEncodedContent(chars.toString());
}
else if (qName.equalsIgnoreCase("media:content"))
{
}
else if (localName.equalsIgnoreCase("link"))
{
try {
feedStr.setUrl(new URL(chars.toString()));
}catch (Exception e){}
}
if (localName.equalsIgnoreCase("item")) {
rssList.add(feedStr);
feedStr = new RssFeedStructure();
articlesAdded++;
if (articlesAdded >= ARTICLES_LIMIT)
{
throw new SAXException();
}
}
}
public void characters(char ch[], int start, int length) {
chars.append(new String(ch, start, length));
}
public List<RssFeedStructure> getLatestArticles(String feedUrl) {
URL url = null;
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
url = new URL(feedUrl);
xr.setContentHandler(this);
xr.parse(new InputSource(url.openStream()));
} catch (IOException e) {
} catch (SAXException e) {
} catch (ParserConfigurationException e) {
}
return rssList;
}
}
I'm new to Java and we were given an assignment about XML Parsing. We have done DOM and now we are on SAX. That's why I'm using SAX Parser for parsing an rss feed. Its already working on files but when I try to parse an online rss feed, it returns an Error 403. I haven't tried parsing the same site on DOM because my laptop is so slow it takes me 5 minutes just to open a file.
Thanks for the help.
public class NewsHandler extends DefaultHandler {
private String url = "http://tomasinoweb.org/feed/rss";
private boolean inDescription = false;
private String[] descs = new String[11];
int i = 0;
public void processFeed() {
try {
SAXParserFactory factory =
SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(url).openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) { e.printStackTrace(); }
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if(qName.equals("description")) inDescription = true;
}
public void characters(char ch[], int start, int length) {
String chars = new String(ch).substring(start, start + length);
if(inDescription) descs[i] = chars;
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equals("description")) {
inDescription = false;
i++;
}
}
public String getDesc(int index) { return descs[index]; }
public static void main(String[] args) {
NewsHandler nh = new NewsHandler();
nh.processFeed();
for(int i=0; i<10; i++) {
System.out.println(nh.getDesc(i));
}
}
}
Solution:
Instead of using String url = "url", I used URL url = new URL("url") and URLConnection con = url.openConnection() and then con.addRequestProperty("user-agent", user-agent string);
I want to parse a very long string from an xml file. You can see the xml file here.
If you visit the above file, there is a "description" tag from which I want to parse the string. When there is a short short string, say 3-lines or 4-lines string in the "description" tag, then my parser(Java SAX parser) easily parse the string but, when the string is hundreds of lines then my parser cannot parse the string. You can check my code that I am using for the parsing and please let me know where I am going wrong in this regard. Please help me in this respect I would be very thankful to you for this act of kindness.
Here is the parser GetterSetter class
public class MyGetterSetter
{
private ArrayList<String> description = new ArrayList<String>();
public ArrayList<String> getDescription()
{
return description;
}
public void setDescription(String description)
{
this.description.add(description);
}
}
Here is the parser Handler class
public class MyHandler extends DefaultHandler
{
String elementValue = null;
Boolean elementOn = false;
Boolean item = false;
public static MyGetterSetter data = null;
public static MyGetterSetter getXMLData()
{
return data;
}
public static void setXMLData(MyGetterSetter data)
{
MyHandler.data = data;
}
public void startDocument() throws SAXException
{
data = new MyGetterSetter();
}
public void endDocument() throws SAXException
{
}
public void startElement(String namespaceURI, String localName,String qName, Attributes atts) throws SAXException
{
elementOn = true;
if (localName.equalsIgnoreCase("item"))
item = true;
}
public void endElement(String namespaceURI, String localName, String qName) throws SAXException
{
elementOn = false;
if(item)
{
if (localName.equalsIgnoreCase("description"))
{
data.setDescription(elementValue);
Log.d("--------DESCRIPTION------", elementValue +" ");
}
else if (localName.equalsIgnoreCase("item")) item = false;
}
}
public void characters(char ch[], int start, int length)
{
if (elementOn)
{
elementValue = new String(ch, start, length);
elementOn = false;
}
}
}
Use the org.w3c.dom package.
public static void main(String[] args) {
try {
URL url = new URL("http://www.aboutsports.co.uk/fixtures/");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(url.openStream());
NodeList list = doc.getElementsByTagName("item"); // get <item> nodes
for (int i = 0; i < list.getLength(); i++) {
Node item = list.item(i);
NodeList descriptions = ((Element)item).getElementsByTagName("description"); // get <description> nodes within an <item>
for (int j = 0; j < descriptions.getLength(); j++) {
Node description = descriptions.item(0);
System.out.println(description.getTextContent()); // print the text content
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
XPath in java is also great for extracting bits from XML documents. Here's an example.
You would use a XPathExpression like /item/description. When you would evaluate it on the XML InputStream, it would return a NodeList like above with all the <description> elements within a <item> element.
If you wanted to do it your way, with a DefaultHandler, you would need to set and unset flags so you can check if you are in the body of a <document> element. The code above probably does something similar internally, hiding it from you. The code is available in java, so why not use it?
<inputs>
<MAT_NO>123</MAT_NO>
<MAT_NO>323</MAT_NO>
<MAT_NO>4223</MAT_NO>
<FOO_BAR>122</FOO_BAR>
<FOO_BAR>125</FOO_BAR>
</inputs>
I've to parse the above the XML. After parsing, i want the values to be in a Map<String, List<String>> with Key values corresponding to the child nodes - MAT_NO, FOO_BAR
and values - the values of the child nodes -123, 323 etc.
Following is my shot. Is there any better way of doing this??
public class UserInputsXmlParser extends DefaultHandler {
private final SaveSubscriptionValues subscriptionValues = null;
private String nodeValue = "";
private final String inputKey = "";
private final List<String> valuesList = null;
private Map<String, List<String>> userInputs;
public Map<String, List<String>> parse(final String strXML) {
try {
final SAXParserFactory parserFactory = SAXParserFactory
.newInstance();
final SAXParser saxParser = parserFactory.newSAXParser();
saxParser.parse(new InputSource(new StringReader(strXML)), this);
return userInputs;
} catch (final SAXException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final IOException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final ParserConfigurationException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final Exception e) {
e.printStackTrace();
throw new MyException("", e);
}
}
#Override
public void startElement(final String uri, final String localName,
final String qName, final Attributes attributes)
throws SAXException {
nodeValue = "";
if ("inputs".equalsIgnoreCase(qName)) {
userInputs = MyUtil.getNewHashMap();
return;
}
}
#Override
public void characters(final char[] ch, final int start, final int length)
throws SAXException {
if (!MyUtil.isEmpty(nodeValue)) {
nodeValue += new String(ch, start, length);
} else {
nodeValue = new String(ch, start, length);
}
}
#Override
public void endElement(final String uri, final String localName,
final String qName) throws SAXException {
if (!"inputs".equalsIgnoreCase(qName)) {
storeUserInputs(qName, nodeValue);
}
}
/**
* #param qName
* #param nodeValue2
*/
private void storeUserInputs(final String qName, final String nodeValue2) {
if (nodeValue2 == null || nodeValue2.trim().equals("")) { return; }
final String trimmedValue = nodeValue2.trim();
final List<String> values = userInputs.get(qName);
if (values != null) {
values.add(trimmedValue);
} else {
final List<String> valueList = new ArrayList<String>();
valueList.add(trimmedValue);
userInputs.put(qName, valueList);
}
}
public static void main(final String[] args) {
final String sample = "<inputs>" + "<MAT_NO>154400-0000</MAT_NO>"
+ "<MAT_NO> </MAT_NO>" + "<MAT_NO>154400-0002</MAT_NO>"
+ "<PAT_NO>123</PAT_NO><PAT_NO>1111</PAT_NO></inputs>";
System.out.println(new UserInputsXmlParser().parse(sample));
}
}
UPDATE: The children of <inputs> nodes are dynamic. I'll be knowing just the root node.
Do you have to provide a solution as part of a SAX event handler? If not then you could use one of the many XML libraries around, such as dom4j. Make the solution a lot simpler;
public static void main(String[] args) throws Exception
{
String sample = "<inputs>" + "<MAT_NO>154400-0000</MAT_NO>"
+ "<MAT_NO> </MAT_NO>" + "<MAT_NO>154400-0002</MAT_NO>"
+ "<PAT_NO>123</PAT_NO><PAT_NO>1111</PAT_NO></inputs>";
System.out.println(parse(sample));
}
static Map<String,List<String>> parse(String xml) throws Exception
{
Map<String,List<String>> map = new HashMap<String,List<String>>();
SAXReader reader = new SAXReader();
Document doc = reader.read(new StringReader(xml));
for (Iterator i = doc.getRootElement().elements().iterator(); i.hasNext();)
{
Element element = (Element)i.next();
//Maybe handle elements with only whitespace text content
List<String> list = map.get(element.getName());
if (list == null)
{
list = new ArrayList<String>();
map.put(element.getName(), list);
}
list.add(element.getText());
}
return map;
}
I would check xstream....( http://x-stream.github.io/tutorial.html )
XStream is a simple library to serialize objects to XML and back again.
For something this basic, look into xpath.