Quotes Issue when Reading from XML File in JAVA - java

I'm trying to read from XML and store the data in a text file.
My code works very well in reading and storing the data, EXCEPT when the paragraph from the XML file contains double quotes.
For example:
<Agent> "The famous spy" James Bond </Agent>
The output will ignore any data with quotes, and the result would be: James Bond
I'm using SAX, and here is part of my code that might have the issue:
public void characters(char[] ch, int start, int length) throws SAXException
{
tempVal = new String(ch, start, length);
}
I think I should replace the Quotes before storing the string in my tempVal.
Any ideas???
HERE is the complete code just in case:
public class Entailment {
private String Text;
private String Hypothesis;
private String ID;
private String Entailment;
}
//Event Handlers
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
//reset
tempVal = "";
if(qName.equalsIgnoreCase("pair")) {
//create a new instance of Entailment
tempEntailment = new Entailment();
tempEntailment.setID(attributes.getValue("id"));
tempEntailment.setEntailment(attributes.getValue("entailment"));
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch, start, length);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equalsIgnoreCase("pair")) {
//add it to the list
Entailments.add(tempEntailment);
}else if (qName.equalsIgnoreCase("t")) {
tempEntailment.setText(tempVal);
}else if (qName.equalsIgnoreCase("h")) {
tempEntailment.setHypothesis(tempVal);
}
}
public static void main(String[] args){
XMLtoTXT spe = new XMLtoTXT();
spe.runExample();
}

Your characters() method is being invoked multiple times because the parser is treating the input as several adjacent text nodes. The way your code is written (which you did not show) your are probably keeping only the last text node.
You need to accumulate the contents of adjacent text nodes yourself.
StringBuilder tempVal = null;
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
//reset
tempVal = new StringBuilder();
....
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal.append(ch, start, length);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
String textValue = tempVal.toString();
....
}
}

Interestingly enough I simulated your situation and my SAX parser works fine. I'm using jdk 1.6.0_20, and this is how I create my parser:
// Obtain a new instance of a SAXParserFactory.
SAXParserFactory factory = SAXParserFactory.newInstance();
// Specifies that the parser produced by this code will provide support for XML namespaces.
factory.setNamespaceAware(true);
// Specifies that the parser produced by this code will validate documents as they are parsed.
factory.setValidating(true);
// Creates a new instance of a SAXParser using the currently configured factory parameters.
saxParser = factory.newSAXParser();
My XML header is:
<?xml version="1.0" encoding="iso-8859-1"?>
What about you?

Related

Java use sax to parse xml files. Can't get the correct content when coming up &amp [duplicate]

This question already has answers here:
SAX parsing and special characters
(2 answers)
Closed 5 years ago.
I have some issues with parsing xml files by sax.
The Java contenthandler code looks like this:
boolean rcontent = false;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("content")) {
rcontent = true;
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (rcontent){
System.out.println("content: " + new String(ch, start, length));
rcontent = false;
}
}
Xml file content is like this:
But the output is:
I want to say
which is not complete.
It's likely that characters(...) is being called multiple times for the single <content> block. Try something like
StringBuilder builder;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("content")) {
builder = new StringBuilder();
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
if (builder != null){
builder.append(new String(ch, start, length));
}
}
#Override
public void endElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (builder != null) {
System.out.println("Content = " + builder);
builder = null;
}
}

How does SAX parsing method charecters() work

I am trying to read data from an xml file and store them in a list using the SAX parsing method. My problem is when I try to store the values of my data using the characters() method. I am creating an object where for each element I store each value and some extra information for some later use but when I try to store said value it stores spaces instead. I tried printing inside the method and while it seams to go through all my xml file it prints only a couple of the elements and not even in the right order. So can someone explain me what I am missing?
XML FILE:
<?xml version="1.0" encoding="UTF-8"?>
<CarModel>
<Audi model = "TT" year = "2006" starting_price = "33.000$">
<type>sport</type>
<horse_power>222hp</horse_power>
<drivetrain>quattro</drivetrain
<transmission>6_Manual</transmission>
</Audi>
<Mercedes model = "W222_S400" year = "2013" starting_price =
63.000$">
<type>luxury</type>
<horse_power>302hp</horse_power>
<drivetrain>front_wheel_drive</drivetrain>
<transmission>7_Automatic</transmission>
</Mercedes>
</CarModel>
JAVA CODE :
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
lvl_cnt++;
System.out.println(lvl_cnt);
obj = new xml_obj();
obj.setLvl(lvl_cnt);
System.out.println("LVL "+obj.getLvl());
if (lvl_cnt == 0) {
obj.setValue(qName);
obj.setParent("root");
System.out.println(obj.getParent());
}
else {
xml_obj tmp = objListofLists.get(objListofLists.size()-1);
int lvl_before = tmp.getLvl();
System.out.println("AAA" + lvl_before);
if (lvl_cnt > lvl_before) {
obj.setParent(tmp.getValue());
}
else if (lvl_cnt < lvl_before) {
int j = 0;
while (objListofLists.get(j).getLvl() != lvl_cnt) j++;
tmp = objListofLists.get(j);
obj.setParent(tmp.getParent());
}
else {
obj.setParent(tmp.getParent());
}
System.out.println(obj.getParent());
}
obj.attributes = attributes;
objListofLists.add(obj);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
lvl_cnt--;
}
public void characters (char ch[], int start, int length) throws SAXException {
String help = new String(ch, start, length);
System.out.println(help);
objListofLists.get(objListofLists.size()-1).setValue(help);
}
}
The characters method only prints the text inside the elements (no attributes), but that also includes all the empty spaces, tabs, and newlines that might be present in the document and are descendants of the root element.
If you want to read text that is inside a particular element, you should set a flag when you enter that element (in startElement()), then unset it in endElement(), and inside characters() you test if you are currently in the element from which you wish to extract text.
private boolean inTypeTag = false;
public void startElement(String uri, String localName, String qName, ...) ...{
if (qName.equals("type") {
inTypeTag = true;
}
...
}
public void endElement(String uri, String localName, String qName, ...) ...{
if (qName.equals("type") {
inTypeTag = false;
}
...
}
public void characters(char ch[], int start, int length) ... {
if (inTypeTag) {
// do something with the text ("sport") which was found in here
}
...
}

Parsing xml special chars issue

I'm parsing an XML got from webservice using SAX.
One of the fields is a link, like the following
<link_site>
http://www.ownhosting.com/webservice_332.asp?id_user=21395&id_parent=33943
</link_site>
I have to get this link and save it, but it is saved like so: id_parent=33943.
Parser snippet:
//inside method startElement
else if(localName.equals("link_site")){
this.in_link=true;
}
...
//inside method endElement
else if(localName.equals("link_site"){
this.in_link=false;
}
Then, I get the content
else if(this.in_link){
xmlparsing.setOrderLink(count, Html.fromHtml(new String(ch, start, length)).toString());
}//I get it and put in a HashMap<Integer,String>
I know that this issue is due to the special characters encoding.
What can I do?
& makes parser to split the line and make several calls to characters() method. You need to concatinate the chunks. Something like this
SAXParserFactory.newInstance().newSAXParser()
.parse(new File("1.xml"), new DefaultHandler() {
String url;
String element;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
element = qName;
url = "";
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (element.equals("link_site")) {
url += new String(ch, start, length);
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (element.equals("link_site")) {
System.out.println(url.trim());
element = "";
}
}
});
prints
http://www.ownhosting.com/webservice_332.asp?id_user=21395&id_parent=33943

Read multiple lines from xml

I am trying to fetch data from a xml file in java using sax parser. I successfully got small amount of data but when data becomes too large and in multiple lines it gives only two lines of data, not all the lines. I am trying following code-
InputStreamReader isr = new InputStreamReader(is);
InputSource source = new InputSource(isr);
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
XMLReader xr = parser.getXMLReader();
GeofenceParametersXMLHandler handler = new GeofenceParametersXMLHandler();
xr.setContentHandler(handler);
xr.parse(source);
And my GeofenceParametersXMLHandler is-
private boolean inTimeZone = false;
private boolean inCoordinate = false;
private boolean outerBoundaryIs = false;
private boolean innerBoundaryIs = false;
private String timeZone;
private List<String> innerCoordinates = new ArrayList<String>();
private String outerCoordinates;
public String getTimeZone() {
return timeZone;
}
public List<String> getInnerCoordinates() {
return innerCoordinates;
}
public String getOuterCoordinates() {
return outerCoordinates;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
if (this.inTimeZone) {
this.timeZone = new String(ch, start, length);
this.inTimeZone = false;
}
if (this.inCoordinate && this.innerBoundaryIs) {
this.innerCoordinates.add(new String(ch, start, length));
this.inCoordinate = false;
this.innerBoundaryIs = false;
}
if (this.inCoordinate && this.outerBoundaryIs) {
this.outerCoordinates = new String(ch, start, length);
this.inCoordinate = false;
this.outerBoundaryIs = false;
}
}
#Override
public void endElement(String uri, String localName, String name) throws SAXException {
super.endElement(uri, localName, name);
}
#Override
public void startDocument() throws SAXException {
super.startDocument();
}
#Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase("timezone")) {
this.inTimeZone = true;
}
if (localName.equalsIgnoreCase("outerBoundaryIs")) {
this.outerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("innerBoundaryIs")) {
this.innerBoundaryIs = true;
}
if (localName.equalsIgnoreCase("coordinates")) {
this.inCoordinate = true;
}
}
And the xml file is-
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2">
<Placemark>
<name>gx:altitudeMode Example</name>
<timezone>EASTERN</timezone>
<Polygon>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-77.05788457660967,38.87253259892824,100
-77.05465973756702,38.87291016281703,100
-77.05315536854791,38.87053267794386,100
-77.05552622493516,38.868757801256,100
-77.05844056290393,38.86996206506943,100
-77.05788457660967,38.87253259892824,100
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
I always got two line of data for coordinates. But when they are in single line I got complete data. How to fetch complete data in multiple line?
Thanks in Advance.
The characters() method won't necessarily give you all the text data in one go (this is a very common misconception, btw).
The proper approach is to concatenate all the data returned by successive calls to characters() (using a StringBuilder or similar). Once your endElement() method is called, you can then treat that text buffer as complete and process it as such.
From the doc:
The Parser will call this method to report each chunk of character
data. SAX parsers may return all contiguous character data in a single
chunk, or they may split it into several chunks
Often you see that for a small XML doc one call to characters() will suffice. However as your XML doc increases in size, you'll find that due to buffering etc. you'll start getting multiple calls. Consequently each call treated on its own appears to be incomplete.

problem with using SAX XML Parser

I am using the SAX Parser for XML Parsing. The problem is for the following XML code:
<description>
Designer:Paul Smith Color:Plain Black Fabric/Composition:100% cotton Weave/Pattern:pinpoint Sleeve:Long-sleeved Fit:Classic Front style:Placket front Back style:Side pleat back Collar:Classic/straight collar Button:Pearlescent front button Pocket:rounded chest pocket Hem:Rounded hem
</description>
I get this:
Designer:Paul Smith
Color:Plain Black
The other parts are missing. The same thing happens for a few other lines. Can anyone kindly tell me whats the problem with my approach ?
My code is given below:
Parser code:
try {
/** Handling XML */
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLReader xr = sp.getXMLReader();
/** Send URL to parse XML Tags */
URL sourceUrl = new URL(
"http://50.19.125.224/Demo/VeryGoodSex_and_the_City_S6E6.xml");
/** Create handler to handle XML Tags ( extends DefaultHandler ) */
MyXMLHandler myXMLHandler = new MyXMLHandler();
xr.setContentHandler((ContentHandler) myXMLHandler);
xr.parse(new InputSource(sourceUrl.openStream()));
} catch (Exception e) {
System.out.println("XML Pasing Excpetion = " + e);
}
Object to hold XML parsed Info:
public class ParserObject {
String name=null;
String description=null;
String bitly=null; //single
String productLink=null;//single
String productPrice=null;//single
Vector<String> price=new Vector<String>();
}
Handler class:
public void endElement(String uri, String localName, String qName)
throws SAXException {
currentElement = false;
if (qName.equalsIgnoreCase("title"))
{
xmlDataObject[index].name=currentValue;
}
else if (qName.equalsIgnoreCase("artist"))
{
xmlDataObject[index].artist=currentValue;
}
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
currentElement = true;
if (qName.equalsIgnoreCase("allinfo"))
{
System.out.println("started");
}
else if (qName.equalsIgnoreCase("tags"))
{
insideTag=1;
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
if (currentElement) {
currentValue = new String(ch, start, length);
currentElement = false;
}
}
You have to concatenate characters which the parser gives to you until it calls endElement.
Try removing currentElement = false; from characters handler, and
currentValue = currentValue + new String(ch, start, length);
Initialize currentValue with an empty string or handle null value in the expression above.
I think characters read some, but not all characters at the same time.
Thus, you only get the first "chunk".
Try printing each character chunk on a separate line, as debugging (before the if).

Categories

Resources