I'm using SAX (Simple API for XML) to parse an XML document. The document is a huge XML file (dblp.xml - 1.46 GB), i wrote a few lines of parser and tested it on small files and it works.
Sample.XML and Student.XML are small files having few lines of XML, my parser parses them but when i change the path to dblp.XML it generates the file not found exception (file is still there with other sample files, but its huge in size)
here is the Exception i get:
java.io.FileNotFoundException: E:\Workspaces\Java\SaxParser\xml\dblp.dtd (The system cannot find the file specified)
here is my code:
package com.teamincredibles.sax;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Parser extends DefaultHandler {
public void getXml() {
try {
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
final MySet openingTagList = new MySet();
final MySet closingTagList = new MySet();
DefaultHandler defaultHandler = new DefaultHandler() {
public void startDocument() throws SAXException {
System.out.println("Starting Parsing...\n");
}
public void endDocument() throws SAXException {
System.out.print("\n\nDone Parsing!");
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (!openingTagList.contains(qName)) {
openingTagList.add(qName);
System.out.print("<" + qName + ">\n");
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
/*for(int i=start; i<(start+length);i++){
System.out.print(ch[i]);
}*/
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (!closingTagList.contains(qName)) {
closingTagList.add(qName);
System.out.print("</" + qName + ">");
}
}
};
saxParser.parse("xml/dblp.xml", defaultHandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
Parser readXml = new Parser();
readXml.getXml();
}
}
What is the matter i can't figure out.
Is your XML file referencing a DTD, in this case "dblp.dtd".
If yes check if its in the location "E:\Workspaces\Java\SaxParser\xml\". If not place it in the location and run your code.
Related
I'm trying to use SAX to parse an XML but it happens that the Handler's startElement() is never called. I do not have any clues why it doesnt work.
This is my code
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import java.io.InputStream;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class ChangePasswordXMLParser {
public static void parseXML(InputStream xml) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
ChangePasswordHandler handler = new ChangePasswordHandler();
saxParser.parse(xml, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
private static class ChangePasswordHandler extends DefaultHandler {
boolean bfReturn;
public ChangePasswordHandler() {
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("return")) {
bfReturn = true;
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (bfReturn) {
AuthenticateUserResult user = SessionManager.getInstance().getUser();
String value = new String(ch, start, length);
EventBus.getDefault().post(new AlterarSenhaEvent(value));
bfReturn = false;
}
}
}
}
and this is and XML Input example:
<?xml version="1.0"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body SOAP-ENC:encodingStyle="http://schemas.xmlsoap.org/soap/envelope/">
<NS1:ChangePasswordResponse xmlns:NS1="urn:exemple">
<return xsi:type="xsd:string">03351-0</return>
</NS1:ChangePasswordResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I did see in other topic here in stackoverflow that may be the imports, but my imports seems fine to me.
Any ideas? Thanks!
You set AlterarSenhaHandler instead of PasswordHandler.
So PasswordHandler is never called.
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("return")) {
System.out.println("inside return");
}
}
proofs that your code is valid
I am currently just trying to process the elements within the item nodes. I am just focusing on the title at this point for simplicity, but I am finding that when it parses, I am just getting the same element three times.
http://open.live.bbc.co.uk/weather/feeds/en/2643123/3dayforecast.rss
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import android.util.Log;
public class XMLHelper extends DefaultHandler {
private String URL_Main="http://open.live.bbc.co.uk/weather/feeds/en/2643123/3dayforecast.rss";
String TAG = "XMLHelper";
Boolean currTag = false;
String currTagVal = "";
public ItemData item = null;
public ArrayList<ItemData> items = new ArrayList<ItemData>();
public void get() {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(URL_Main).openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) {
Log.e(TAG, "Exception: " + e.getMessage());
}
}
// Receives notification of the start of an element
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
Log.i(TAG, "TAG: " + localName);
currTag = true;
currTagVal = "";
if (localName.equals("channel")) {
item = new ItemData();
}
}
// Receives notification of end of element
public void endElement(String uri, String localName, String qName)
throws SAXException {
currTag = false;
if (localName.equalsIgnoreCase("title"))
item.setTitle(currTagVal);
else if (localName.equalsIgnoreCase("item"))
items.add(item);
}
// Receives notification of character data inside an element
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currTag) {
currTagVal = currTagVal + new String(ch, start, length);
currTag = false;
}
}
}
The reason you are getting the same value thrice is because you are creating the object when there is a channel tag in startElement method.
if (localName.equals("channel")) {
item = new ItemData();
}
I guess you should initiate the object whenever there is a item tag as below
if (localName.equals("item")) { // check for item tag
item = new ItemData();
}
Remodify your whole project, you need 3 classes :
1.ItemList
2.XMLHandler extends Default handler
3.SAXParsing activity
Make your code organized first
In your XMLHandler class extend default handler your code should look like
public class MyXMLHandler extends DefaultHandler
{
public static ItemList itemList;
public boolean current = false;
public String currentValue = null;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// TODO Auto-generated method stub
current = true;
if (localName.equals("channel"))
{
/** Start */
itemList = new ItemList();
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
current = false;
if(localName.equals("item"))
{
itemList.setItem(currentValue);
}
else if(localName.equals("title"))
{
itemList.setManufacturer(currentValue);
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
if(current)
{
currentValue = new String(ch, start, length);
current=false;
}
}
}
ItemList class is used to set , setter and getter methods to pass in values of arraylist and retrieve those array lists in the SAXParsing activity.
I hope this solution helps. :D
I do this
package file;
import java.io.IOException;
import org.apache.log4j.Logger;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.LocatorImpl;
import org.xml.sax.helpers.XMLReaderFactory;
public class GetNatureSax implements ContentHandler {
private final static Logger _logger = Logger.getLogger(GetNatureSax.class.getName());
private boolean isCurrentElement = false;
private Locator locator;
private String _parameterValue = null;
public GetNatureSax() {
super();
locator = new LocatorImpl();
}
public String getValeurParametre() {
return _parameterValue;
}
public void setDocumentLocator(Locator value) {
locator = value;
}
public void startDocument() throws SAXException {}
public void endDocument() throws SAXException {}
public void startPrefixMapping(String prefix, String URI) throws SAXException {
}
public void endPrefixMapping(String prefix) throws SAXException {}
public void startElement(String nameSpaceURI, String localName, String rawName, Attributes attributs)
throws SAXException {
if (localName.equals("Nature")) {
isCurrentElement = true;
}
}
public void endElement(String nameSpaceURI, String localName, String rawName) throws SAXException {
if (localName.equals("Nature")) {
isCurrentElement = false;
}
}
public void characters(char[] ch, int start, int end) throws SAXException {
if (isCurrentElement) {
_parameterValue = new String(ch, start, end);
}
}
public void ignorableWhitespace(char[] ch, int start, int end) throws SAXException {
System.out.println("espaces inutiles rencontres : ..." + new String(ch, start, end) + "...");
}
public void processingInstruction(String target, String data) throws SAXException {
System.out.println("Instruction de fonctionnement : " + target);
System.out.println(" dont les arguments sont : " + data);
}
public void skippedEntity(String arg0) throws SAXException {}
public void parseFichier(String i_fichierATraiter) {
XMLReader saxReader;
try {
saxReader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
saxReader.setContentHandler(new GetNatureSax());
saxReader.parse(i_fichierATraiter);
System.out.println(_parameterValue);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
My XML:
...
<Reference>
<Nature>ACHAT</Nature>
<Statut>2</Statut>
<Type-Gest>RE</Type-Gest>
<Gest>RE_ELECTRA</Gest>
<Type-Res>D</Type-Res>
<Nb-h>24</Nb-h>
</Reference>
...
why when it execute this line
System.out.println(_parameterValue);
my variable is null and before when i debug my variable is not null
Because you instantiate a new GetNatureSax and give it to the SaxReader has a Content Handler.
So when the parsing has ended, it is the new GetNatureSax instance that have been modified and which have the field _parameterValue set, not the current instance (this).
Just modify your parseFichier method like this:
saxReader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
saxReader.setContentHandler(this); // here
saxReader.parse(i_fichierATraiter);
System.out.println("Found: " + getValeurParametre());
Using my debugger I can see you are using two GetNatureSax
saxReader.setContentHandler(new GetNatureSax());
This creates a second GetNatureSax object where is where the value is being set.
Changing it to
saxReader.setContentHandler(this);
fixed the problem.
I haven't worked much with XML before so maybe my ignorance on proper terminology is hurting me in my searches on how to do this. I have the code snippet below which I am using to parse an XML file like the one below. The problem is that it only picks up XML values within <Tag>Value</Tag> but not for the one below where I need to get the value of TagValue, which in this case would be "Russell Diamond".
I would appreciate if anyone can offer assistance as to how to get custom values like this. Thanks.
<Tag TagName="#Subject" TagDataType="Text" TagValue="Russell Diamond"/>
The snippet I am using:
public void printElementNames(String fileName) throws IOException {
//test write to file
FileWriter fstream = new FileWriter("/home/user/Desktop/readEDRMtest.txt");
final BufferedWriter out = new BufferedWriter(fstream);
//
try {
SAXParserFactory parserFact = SAXParserFactory.newInstance();
SAXParser parser = parserFact.newSAXParser();
System.out.println("XML Elements: ");
DefaultHandler handler = new DefaultHandler() {
public void startElement(String uri, String lName, String ele,
Attributes attributes) throws SAXException {
// print elements of xml
System.out.println(ele);
try {
out.write(ele);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println("Value : "
+ new String(ch, start, length));
try {
out.write("Value : "
+ new String(ch, start, length));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
};
You want to look into extracting attributes. Search on that and you'll find your answer.
The DefaultHandler class's startElement(...) method passes a parameter called attributes that refers to an Attribute object. The API for the Attribute interface will describe how to extract the information you need from this object.
For example:
out.write(attributes.getValue("TagValue"));
This is a stripped down and working version of your code snippet:
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAX
{
public static void main(String[] args) throws IOException {
new SAX().printElementNames("Delete.xml");
}
public void printElementNames(String fileName) throws IOException
{
try {
SAXParserFactory parserFact = SAXParserFactory.newInstance();
SAXParser parser = parserFact.newSAXParser();
DefaultHandler handler = new DefaultHandler()
{
public void startElement(String uri, String lName, String ele, Attributes attributes) throws SAXException {
System.out.println(ele);
System.out.println(attributes.getValue("TagValue"));
}
public void characters(char ch[], int start, int length) throws SAXException {
System.out.println("Value : " + new String(ch, start, length));
}
};
parser.parse(new File(fileName), handler);
}catch(Exception e){
e.printStackTrace();
}
}
}
Content of Delete.xml
<Tag TagName="#Subject" TagDataType="Text" TagValue="Russell Diamond"/>
Further reading:
http://www.java-samples.com/showtutorial.php?tutorialid=152
I have parsed an XML file and have gotten a Node that I am interested in. How can I now find the line number in the source XML file where this node occurs?
EDIT:
Currently I am using the SAXParser to parse my XML. However I will be happy with a solution using any parser.
Along with the Node, I also have the XPath expression for the node.
I need to get the line number because I am displaying the XML file in a textbox, and need to highlight the line where the node occured. Assume that the XML file is nicely formatted with sufficient line breaks.
I have got this working by following this example:
http://eyalsch.wordpress.com/2010/11/30/xml-dom-2/
This solution follows the method suggested by Michael Kay. Here is how you use it:
// XmlTest.java
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
public class XmlTest {
public static void main(final String[] args) throws Exception {
String xmlString = "<foo>\n"
+ " <bar>\n"
+ " <moo>Hello World!</moo>\n"
+ " </bar>\n"
+ "</foo>";
InputStream is = new ByteArrayInputStream(xmlString.getBytes());
Document doc = PositionalXMLReader.readXML(is);
is.close();
Node node = doc.getElementsByTagName("moo").item(0);
System.out.println("Line number: " + node.getUserData("lineNumber"));
}
}
If you run this program, it will out put: "Line number: 3"
PositionalXMLReader is a slightly modified version of the example linked above.
// PositionalXMLReader.java
import java.io.IOException;
import java.io.InputStream;
import java.util.Stack;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class PositionalXMLReader {
final static String LINE_NUMBER_KEY_NAME = "lineNumber";
public static Document readXML(final InputStream is) throws IOException, SAXException {
final Document doc;
SAXParser parser;
try {
final SAXParserFactory factory = SAXParserFactory.newInstance();
parser = factory.newSAXParser();
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
} catch (final ParserConfigurationException e) {
throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
}
final Stack<Element> elementStack = new Stack<Element>();
final StringBuilder textBuffer = new StringBuilder();
final DefaultHandler handler = new DefaultHandler() {
private Locator locator;
#Override
public void setDocumentLocator(final Locator locator) {
this.locator = locator; // Save the locator, so that it can be used later for line tracking when traversing nodes.
}
#Override
public void startElement(final String uri, final String localName, final String qName, final Attributes attributes)
throws SAXException {
addTextIfNeeded();
final Element el = doc.createElement(qName);
for (int i = 0; i < attributes.getLength(); i++) {
el.setAttribute(attributes.getQName(i), attributes.getValue(i));
}
el.setUserData(LINE_NUMBER_KEY_NAME, String.valueOf(this.locator.getLineNumber()), null);
elementStack.push(el);
}
#Override
public void endElement(final String uri, final String localName, final String qName) {
addTextIfNeeded();
final Element closedEl = elementStack.pop();
if (elementStack.isEmpty()) { // Is this the root element?
doc.appendChild(closedEl);
} else {
final Element parentEl = elementStack.peek();
parentEl.appendChild(closedEl);
}
}
#Override
public void characters(final char ch[], final int start, final int length) throws SAXException {
textBuffer.append(ch, start, length);
}
// Outputs text accumulated under the current node
private void addTextIfNeeded() {
if (textBuffer.length() > 0) {
final Element el = elementStack.peek();
final Node textNode = doc.createTextNode(textBuffer.toString());
el.appendChild(textNode);
textBuffer.delete(0, textBuffer.length());
}
}
};
parser.parse(is, handler);
return doc;
}
}
If you are using a SAX parser then the line number of an event can be obtained using the Locator object, which is notified to the ContentHandler via the setDocumentLocator() callback. This is called at the start of parsing, and you need to save the Locator; then after any event (such as startElement()), you can call methods such as getLineNumber() to obtain the current position in the source file. (After startElement(), the callback is defined to give you the line number on which the ">" of the start tag appears.)
Note that according to the spec (of Locator.getLineNumber()) the method returns the line number where the SAX-event ends!
In the case of "startElement()" this means:
Here the line number for Element is 1:
<Element></Element>
Here the line number for Element is 3:
<Element
attribute1="X"
attribute2="Y">
</Element>
priomsrb's answer is great and works. For my usecase i need to integrate it to an existing framework where e.g. the encoding is also covered. Therefore the following refactoring was applied to have a separate LineNumberHandler class.
Then the code will also work with a Sax InputSource where the encoding can be modified like this:
// read in the xml document
org.xml.sax.InputSource is=new org.xml.sax.InputSource();
is.setByteStream(instream);
if (encoding!=null) {
is.setEncoding(encoding);
if (Debug.CORE)
Debug.log("setting XML encoding to - "+is.getEncoding());
}
Separate LineNumberHandler
/**
* LineNumber Handler
* #author wf
*
*/
public static class LineNumberHandler extends DefaultHandler {
final Stack<Element> elementStack = new Stack<Element>();
final StringBuilder textBuffer = new StringBuilder();
private Locator locator;
private Document doc;
/**
* create a line number Handler for the given document
* #param doc
*/
public LineNumberHandler(Document doc) {
this.doc=doc;
}
#Override
public void setDocumentLocator(final Locator locator) {
this.locator = locator; // Save the locator, so that it can be used
// later for line tracking when traversing
// nodes.
}
#Override
public void startElement(final String uri, final String localName,
final String qName, final Attributes attributes) throws SAXException {
addTextIfNeeded();
final Element el = doc.createElement(qName);
for (int i = 0; i < attributes.getLength(); i++) {
el.setAttribute(attributes.getQName(i), attributes.getValue(i));
}
el.setUserData(LINE_NUMBER_KEY_NAME,
String.valueOf(this.locator.getLineNumber()), null);
elementStack.push(el);
}
#Override
public void endElement(final String uri, final String localName,
final String qName) {
addTextIfNeeded();
final Element closedEl = elementStack.pop();
if (elementStack.isEmpty()) { // Is this the root element?
doc.appendChild(closedEl);
} else {
final Element parentEl = elementStack.peek();
parentEl.appendChild(closedEl);
}
}
#Override
public void characters(final char ch[], final int start, final int length)
throws SAXException {
textBuffer.append(ch, start, length);
}
// Outputs text accumulated under the current node
private void addTextIfNeeded() {
if (textBuffer.length() > 0) {
final Element el = elementStack.peek();
final Node textNode = doc.createTextNode(textBuffer.toString());
el.appendChild(textNode);
textBuffer.delete(0, textBuffer.length());
}
}
};
PositionalXMLReader
public class PositionalXMLReader {
final static String LINE_NUMBER_KEY_NAME = "lineNumber";
/**
* read a document from the given input strem
*
* #param is
* - the input stream
* #return - the Document
* #throws IOException
* #throws SAXException
*/
public static Document readXML(final InputStream is)
throws IOException, SAXException {
final Document doc;
SAXParser parser;
try {
final SAXParserFactory factory = SAXParserFactory.newInstance();
parser = factory.newSAXParser();
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
} catch (final ParserConfigurationException e) {
throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
}
LineNumberHandler handler = new LineNumberHandler(doc);
parser.parse(is, handler);
return doc;
}
}
JUnit Testcase
package com.bitplan.common.impl;
import static org.junit.Assert.assertEquals;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import com.bitplan.bobase.PositionalXMLReader;
public class TestXMLWithLineNumbers {
/**
* get an Example XML Stream
* #return the example stream
*/
public InputStream getExampleXMLStream() {
String xmlString = "<foo>\n" + " <bar>\n"
+ " <moo>Hello World!</moo>\n" + " </bar>\n" + "</foo>";
InputStream is = new ByteArrayInputStream(xmlString.getBytes());
return is;
}
#Test
public void testXMLWithLineNumbers() throws Exception {
InputStream is = this.getExampleXMLStream();
Document doc = PositionalXMLReader.readXML(is);
is.close();
Node node = doc.getElementsByTagName("moo").item(0);
assertEquals("3", node.getUserData("lineNumber"));
}
}