parsing a large xml string in java using XMLReader - java

I have the following bit of code which parses an XML string returned from from a database:
XMLReader xReader = XMLReaderFactory.createXMLReader();
xReader.setContentHandler(parser);
xReader.parse(new InputSource(new StringReader(theResponseStringFromTheDatabase)));
whenever the theResponseStringFromTheDatabase is too large, the parsing fails. Is there a way to modify the code so it will parse large strings?
best wishes,
ck

I would suggest that you need to get a Reader accessing that column.
InputSource can take a Reader object, and that reader should be one capable of pulling the XML incrementally (suitable for the SAX reader underlying the XMLReader)
ResultSet.getCharacterStream() may do the trick.

Related

getting stackoverflowerror while converting org.w3c.dom.Document to org.dom4j.Document

I am getting stackoverflowerror while conveting org.w3c.dom.Document to org.dom4j.Document
Code :
public static org.dom4j.Document getDom4jDocument(Document w3cDocument)
{
//System.out.println("XMLUtility : Inside getDom4jDocument()");
org.dom4j.Document dom4jDocument = null;
DOMReader xmlReader = null;
try{
//System.out.println("Before conversion of w3cdoc to dom4jdoc");
xmlReader = new DOMReader();
dom4jDocument = xmlReader.read(w3cDocument);
//System.out.println("Conversion complete");
}catch(Exception e){
System.out.println("General Exception :- "+e.getMessage());
}
//System.out.println("XMLUtility : getDom4jDocument() Finished");
return dom4jDocument;
}
log :
java.lang.StackOverflowError
at java.lang.String.indexOf(String.java:1564)
at java.lang.String.indexOf(String.java:1546)
at org.dom4j.tree.NamespaceStack.getQName(NamespaceStack.java:158)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:184)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:93)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:226)
at org.dom4j.io.DOMReader.readTree(DOMReader.java:93)
at org.dom4j.io.DOMReader.readElement(DOMReader.java:226)
Actually i want to convert XML to string by using org.dom4j.Document's asXML method. Is this conversion possible without converting org.w3c.dom.Document to org.dom4j.Document ? How ?
when handling a heavy file, you shouldn't use a DOM reader, but a SAX one. I assume your goal is to output your document to a string.
Here you can find some differences between SAX and DOM (source) :
SAX
Parses node by node
Doesn’t store the XML in memory
We cant insert or delete a node
SAX is an event based parser
SAX is a Simple API for XML
doesn’t preserve comments
SAX generally runs a little faster than DOM
DOM
Stores the entire XML document into memory before processing
Occupies more memory
We can insert or delete nodes
Traverse in any direction.
DOM is a tree model parser
Document Object Model (DOM) API
Preserves comments
SAX generally runs a little faster than DOM
You don't need to produce a model which will need a lot of memory space. You only need to crawl through nodes to output them one by one.
Here, you will find some code to start with ; then you should implement a tree traversal algorithm.
Regards
Take a look at java.lang.StackOverflowError in dom parser. Apparently trying to load a huge XML file into a String can result in a StackoverflowException. I think it's because the parser uses regex's to find the start and end of tags, which involves recursive calls for long Strings as described in java.lang.StackOverflowError while using a RegEx to Parse big strings.
You can try and split up the XML document and parse the sections separately and see if that helps.

Stream xml input to sax Parser, How to print the xml streamed?

Well I am trying to connect to one remote server via socket, and I get big xml responses back from socket, delimited by a '\n' character.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
.......
.......
</data>
</Response>\n <---- \n acts as delimiter
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
....
....
</data>
</Response>\n
..
I am trying to parse these xml using SAX Parser. Ideally I want to get one full response to a string by searching for '\n' and give this response to parser. But since my single response is very large, I am getting outOfMemory Exception when holding such a large xml in string..So the only option remained was to stream the xml to SAX.
SAXParserFactory spfactory = SAXParserFactory.newInstance();
SAXParser saxParser = spfactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(new MyDefaultHandler(context));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(mySocket.getInputStream()));
xmlReader.parse(xmlInputSource);
I am using closeShieldInputStream to prevent SAX closing my socket stream on exception because of '\n'. I asked a previous question on that..
Now sometimes I am getting Parse Error
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 8: not well-formed (invalid token)
I searched for it and found out this error normally comes when the encoding of actual xml is not same as what SAX expecting. I wrote a C program and print out the xml, and all my xml is encoded by UTF-8.
Now my question..
Is there any other reason for the above given error in xml parsing
other than encoding issue
Is there any way to print (or write to any file) the input to SAX as
it streams from socket?
After trying Hemal Pandya's answer..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt"));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
xmlReader.parse(xmlInputSource);
a new file with name log.txt getting created when I mount the SDCard but it is empty..Am I using this right?
Well Finally how I done it..
I worked it out with TeeInputStream itself..thanks Hemal Pandya for suggesting that..
//open a log file in append mode..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt",true));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
try{
xmlReader.parse(xmlInputSource);
//flush content in the log stream to file..this code only executes if parsing completed successfully
log.flush();
}catch(SaxException e){
//we want to get the log even if parsing failed..So we are making sure we get the log in either case..
log.flush();
}
Is there any way to print (or write to any file) the input to SAX as
it streams from socket?
Apache Commons has a TeeInputStream that should be useful.
OutputStream log = new BufferedOutputStream(new FileOutputtStream("response.xml"));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(new TeeInputStream(mySocket.getInputStream(), log)));
I haven't used it, you might want to try it first in a standalone program to figure out close semantics, though looking at docs and your requirements it looks like you would want to close it separately at end.
I am not familiar with Expat, but to accomplish you are are describing in general, you need a SAX parser that supports pushing data into the parser instead of having the parser pull data from a source. Check if Expat supports a push model. If it does, then you can simply read a chunk of data from the socket, push it to the parser, and it will parse whatever it can from the chuck, caching any remaining data for use during the next push. Repeat as needed until you are ready to close the socket connection. In this model, the \n separator would get treated as miscellaneous whitespace between nodes, so you have to use the SAX events to detect when a new <Response> node opens and closes. Also, because you are receiving multiple <Response> nodes in the data, and XML does not allow more than 1 top-level document node, you would need to push a custom opening tag into the parser before you then start pushing the socket data into the parser. The custom opening tag will then become the top-level document node, and the <Response> nodes will be children of it.

How to parse multiple XML feeds at once from an array of URLs with SAX Parser for Java?

I am working on an Android application that parses one or more XML feeds based on user preferences. Is it possible to parse (using SAX Parser) more than one XML feed at once by providing the parser with an array of URLs of my XML feeds?
If no, what would be an alternative way of listing the parsed items from different XML feeds in one list? An intuitive approach is to use java.io.SequenceInputStream to merge the two input streams. However, this throws a NullPointerException:
try {
URL urlOne = new URL("http://example.com/feedone.xml");
URL urlTwo = new URL("http://example.com/feedtwo.xml");
InputStream streamOne = urlOne.openStream();
InputStream streamTwo = urlTwo.openStream();
InputStream streamBoth = new SequenceInputStream(streamOne, streamTwo);
InputSource sourceBoth = new InputSource(streamBoth);
//Parsing
stream = xmlHandler.getStream();
}
catch (Exception error) {
error.printStackTrace();
}
List<Item> content = stream.getList();
return content;
The tactic of appending the streams before parsing is not likely to work well, as the appended XML will not be valid XML. As each XML input has its own root element, the appended XML will have multiple roots, which is not permitted in XML. Additionally it's likely to have multiple XML headers like
<?xml version="1.0" encoding="UTF-8"?>
which is also invalid.
While it's possible to preprocess the input to work around these issues, you're likely better off parsing them separately and dealing with getting the results combined later.
It's possible to make a SAX parser add the parsed elements to an existing list of elements. If you post code in your question showing how you're parsing a single file, we might be able to help figure out how to adjust it to your need for multiple inputs.

Problem in using stream reader

I have XML data as a string which has to parsed, I am converting the XML string to inputsource using the following code:
StringReader reader1 = new StringReader( xmlstring);
InputSource inputSource1= new InputSource( reader );
And I am passing input source to
Document doc = builder.build(inputSource);
I want to use the same inputSource1 in another parser class also, but I am getting stream closed.
How would I handle this or is there any other way to pass XML data to a parser other than file?
Looking at the JavaDoc, it seems that InputSource is not designed to be shared and reused by multiple parsers.
standard processing of both byte and character streams is to close them on as part of end-of-parse cleanup, so applications should not attempt to re-use such streams after they have been handed to a parser.
So you will have to create a new InputSource. If you are really reading from a String, there would be no difference in I/O or memory cost anyway.

Converting a raw file (binary data ) into XML file

I'm working on a project under which i have to take a raw file from the server and convert it into XML file.
Is there any tool available in java which can help me to accomplish this task like JAXP can be used to parse the XML document ?
I guess you will need your objects for later use ,so create MyObject that will be some bean that you will load the values form your Raw File and you can write this to someFile.xml
FileOutputStream os = new FileOutputStream("someFile.xml");
XMLEncoder encoder = new XMLEncoder(os);
MyObject p = new MyObject();
p.setFirstName("Mite");
encoder.writeObject(p);
encoder.close();
Or you con go with TransformerFactory if you don't need the objects for latter use.
Yes. This assumes that the text in the raw file is already XML.
You start with the DocumentBuilderFactory to get a DocumentBuilder, and then you can use its parse() method to turn an input stream into a Document, which is an internal XML representation.
If the raw file contains something other than XML, you'll want to scan it somehow (your own code here) and use the stuff you find to build up from an empty Document.
I then usually use a Transformer from a TransformerFactory to convert the Document into XML text in a file, but there may be a simpler way.
JAXP can also be used to create a new, empty document:
Document dom = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.newDocument();
Then you can use that Document to create elements, and append them as needed:
Element root = dom.createElement("root");
dom.appendChild(root);
But, as Jørn noted in a comment to your question, it all depends on what you want to do with this "raw" file: how should it be turned into XML. And only you know that.
I think if you try to load it in an XmlDocument this will be fine

Categories

Resources