Well I am trying to connect to one remote server via socket, and I get big xml responses back from socket, delimited by a '\n' character.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
.......
.......
</data>
</Response>\n <---- \n acts as delimiter
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
....
....
</data>
</Response>\n
..
I am trying to parse these xml using SAX Parser. Ideally I want to get one full response to a string by searching for '\n' and give this response to parser. But since my single response is very large, I am getting outOfMemory Exception when holding such a large xml in string..So the only option remained was to stream the xml to SAX.
SAXParserFactory spfactory = SAXParserFactory.newInstance();
SAXParser saxParser = spfactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(new MyDefaultHandler(context));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(mySocket.getInputStream()));
xmlReader.parse(xmlInputSource);
I am using closeShieldInputStream to prevent SAX closing my socket stream on exception because of '\n'. I asked a previous question on that..
Now sometimes I am getting Parse Error
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 8: not well-formed (invalid token)
I searched for it and found out this error normally comes when the encoding of actual xml is not same as what SAX expecting. I wrote a C program and print out the xml, and all my xml is encoded by UTF-8.
Now my question..
Is there any other reason for the above given error in xml parsing
other than encoding issue
Is there any way to print (or write to any file) the input to SAX as
it streams from socket?
After trying Hemal Pandya's answer..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt"));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
xmlReader.parse(xmlInputSource);
a new file with name log.txt getting created when I mount the SDCard but it is empty..Am I using this right?
Well Finally how I done it..
I worked it out with TeeInputStream itself..thanks Hemal Pandya for suggesting that..
//open a log file in append mode..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt",true));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
try{
xmlReader.parse(xmlInputSource);
//flush content in the log stream to file..this code only executes if parsing completed successfully
log.flush();
}catch(SaxException e){
//we want to get the log even if parsing failed..So we are making sure we get the log in either case..
log.flush();
}
Is there any way to print (or write to any file) the input to SAX as
it streams from socket?
Apache Commons has a TeeInputStream that should be useful.
OutputStream log = new BufferedOutputStream(new FileOutputtStream("response.xml"));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(new TeeInputStream(mySocket.getInputStream(), log)));
I haven't used it, you might want to try it first in a standalone program to figure out close semantics, though looking at docs and your requirements it looks like you would want to close it separately at end.
I am not familiar with Expat, but to accomplish you are are describing in general, you need a SAX parser that supports pushing data into the parser instead of having the parser pull data from a source. Check if Expat supports a push model. If it does, then you can simply read a chunk of data from the socket, push it to the parser, and it will parse whatever it can from the chuck, caching any remaining data for use during the next push. Repeat as needed until you are ready to close the socket connection. In this model, the \n separator would get treated as miscellaneous whitespace between nodes, so you have to use the SAX events to detect when a new <Response> node opens and closes. Also, because you are receiving multiple <Response> nodes in the data, and XML does not allow more than 1 top-level document node, you would need to push a custom opening tag into the parser before you then start pushing the socket data into the parser. The custom opening tag will then become the top-level document node, and the <Response> nodes will be children of it.
Related
To my app is sent data in xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><response><status>FAIL</status><time>2012-12-11 22:35</time></response>
I can read this data by:
String si;
while(is.available()>0){
si+=(char)is.read();
}
But using Document builder:
Document doc=db.parse(sock.getInputStream());
Hangs my application in this place.
Can somebody explain me this?
My guess is that the socket input stream never signals end of file and so the parse method just hangs waiting for the next character. I suggest using your example loop to collect the XML and then pass the string to the parse method by wrapping it in a ByteArrayInputStream or StringBufferInputStream.
I have the following bit of code which parses an XML string returned from from a database:
XMLReader xReader = XMLReaderFactory.createXMLReader();
xReader.setContentHandler(parser);
xReader.parse(new InputSource(new StringReader(theResponseStringFromTheDatabase)));
whenever the theResponseStringFromTheDatabase is too large, the parsing fails. Is there a way to modify the code so it will parse large strings?
best wishes,
ck
I would suggest that you need to get a Reader accessing that column.
InputSource can take a Reader object, and that reader should be one capable of pulling the XML incrementally (suitable for the SAX reader underlying the XMLReader)
ResultSet.getCharacterStream() may do the trick.
I'm doing an XML validation in Java, using SAX, and i'd like to recognize the following kind of error :
"An invalid character was found in text content".
At the moment, i have a validation with SAX, and for some documents i have corrupted characters not detected as errors. When i try to open the result XML file with IE Browser for example, i get an error message "an invalid character was found in text content".
This is an example of XML data:
<?xml version='1.0' encoding='UTF-8' standalone='yes'>
<!DOCTYPE blabla SYSTEM 'blabla.dtd'>
<blabla type='type' num='num'>
<...>... corrupted character </...>
</blabla>
And this is an example of the instanciation of the parser:
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
parser = factory.newSAXParser();
parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
parser.setProperty(JAXP_SCHEMA_SOURCE, new File(theConfig.getRoot()
.concat(File.separator).concat(theConfig.getXsdFileName())
.concat("-v").concat(theConfig.getXsdFileVersion()).concat(
XSD_EXTENSION)));
reader = parser.getXMLReader();
reader.setErrorHandler(getHandler());
reader.setEntityResolver(new MyEntityResolver(theConfig.getRoot(),
theConfig));
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(theDataToParse));
reader.parse(is);
The error handler implements methods 'warning', 'error' and 'fatalError', but nothing is detected.
The entity resolver enable to lead a custome entity file, stored in a configuration directory.
Does someone have an idea why such malformed character error is not detected ? Is it because my stream comes from a String and not a file ?
Thanks in advance for your help.
Regards.
yes, apparently you have already done the byte to character conversion since you are holding the string already. if you want to detect the invalid character, you need to parse the bytes. in general, it's not good to hold xml data as string data as you risk corrupting it through incorrect character encoding. the best way to treat xml is as binary data.
I am working on an Android application that parses one or more XML feeds based on user preferences. Is it possible to parse (using SAX Parser) more than one XML feed at once by providing the parser with an array of URLs of my XML feeds?
If no, what would be an alternative way of listing the parsed items from different XML feeds in one list? An intuitive approach is to use java.io.SequenceInputStream to merge the two input streams. However, this throws a NullPointerException:
try {
URL urlOne = new URL("http://example.com/feedone.xml");
URL urlTwo = new URL("http://example.com/feedtwo.xml");
InputStream streamOne = urlOne.openStream();
InputStream streamTwo = urlTwo.openStream();
InputStream streamBoth = new SequenceInputStream(streamOne, streamTwo);
InputSource sourceBoth = new InputSource(streamBoth);
//Parsing
stream = xmlHandler.getStream();
}
catch (Exception error) {
error.printStackTrace();
}
List<Item> content = stream.getList();
return content;
The tactic of appending the streams before parsing is not likely to work well, as the appended XML will not be valid XML. As each XML input has its own root element, the appended XML will have multiple roots, which is not permitted in XML. Additionally it's likely to have multiple XML headers like
<?xml version="1.0" encoding="UTF-8"?>
which is also invalid.
While it's possible to preprocess the input to work around these issues, you're likely better off parsing them separately and dealing with getting the results combined later.
It's possible to make a SAX parser add the parsed elements to an existing list of elements. If you post code in your question showing how you're parsing a single file, we might be able to help figure out how to adjust it to your need for multiple inputs.
I have XML data as a string which has to parsed, I am converting the XML string to inputsource using the following code:
StringReader reader1 = new StringReader( xmlstring);
InputSource inputSource1= new InputSource( reader );
And I am passing input source to
Document doc = builder.build(inputSource);
I want to use the same inputSource1 in another parser class also, but I am getting stream closed.
How would I handle this or is there any other way to pass XML data to a parser other than file?
Looking at the JavaDoc, it seems that InputSource is not designed to be shared and reused by multiple parsers.
standard processing of both byte and character streams is to close them on as part of end-of-parse cleanup, so applications should not attempt to re-use such streams after they have been handed to a parser.
So you will have to create a new InputSource. If you are really reading from a String, there would be no difference in I/O or memory cost anyway.