android and parsing documents in xml - java

To my app is sent data in xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><response><status>FAIL</status><time>2012-12-11 22:35</time></response>
I can read this data by:
String si;
while(is.available()>0){
si+=(char)is.read();
}
But using Document builder:
Document doc=db.parse(sock.getInputStream());
Hangs my application in this place.
Can somebody explain me this?

My guess is that the socket input stream never signals end of file and so the parse method just hangs waiting for the next character. I suggest using your example loop to collect the XML and then pass the string to the parse method by wrapping it in a ByteArrayInputStream or StringBufferInputStream.

Related

Java How to read a TEXT log file that contains separate XML and extract the XML Blocks

Using Java
I have an application that writes all the XML messages it receives into Log files. Each XML message is appended to the log file by log4j.
This should work for any type of text file that has a unique start and end of a block of text. This is not specific.
There can be thousands of XML messages in each log file and new log files are created each day.
I have no need to parse the XML in the log files into it's elements other than to pull each XML message (block of XML) from the log file into a memory variable in Java. (The entire block of XML from the first tag to the last needs to be in a memory variable).
I will be sending this XML to a Web Service to be processed similar to the way it is being sent over from our Middleware today.
I already have that part working to send the XML and am able to process files as long as the XML is on one row in the log file but the XML writer in the application writs the XML in some cases in an indented new line format hence the need to be able to pluck the blocks of XML messages from the log file.
So -I need to read the XML in the log file from the first tag:
<?xml version='1.0' encoding='UTF-8'?>
Until the last tag of the last tag:
</MyXML>
No matter if there are carriage returns or blank lines in the file and it has to skip invalid rows.
Each XML message can be small to very large - easily over 20k.
The XML log file will look something like the following and may have blank rows between each XML message or other text as shown below:
<?xml version='1.0' encoding='UTF-8'?>
<MyXML>
<Envelope documentType="SetProfile" trader="BEA" dtdRev="2.0" xid="03-JUL-17 14:38:49" traderLogin="middleware" traderPassword="abc123"/>
<Payload><SetProfile allowInvalidProfile="F">
<Partner publisherID="52725" act="Update">
<Contact languageCode="EN" firstName="Luis" lastName="Dini" email="Dini#email.com" act="Update" publisherID="ldini" securityRoleCode="6"/>
</Partner></SetProfile>
</Payload>
</MyXML>
<?xml version='1.0' encoding='UTF-8'?><MyXML><Envelope documentType="SetProfile" trader="BEA" dtdRev="2.0" xid="03-JUL-17 14:38:49" traderLogin="middleware" traderPassword="abc123"/><Payload><SetProfile allowInvalidProfile="F"><Partner publisherID="9857684" act="Update"><Contact languageCode="EN" firstName="Bill" lastName="Jones" email="Jones#email.com" act="Update" publisherID="BJones" securityRoleCode="3"/></Partner></SetProfile></Payload></MyXML>
======================
#]
<?xml version='1.0' encoding='UTF-8'?><MyXML><Envelope documentType="SetProfile" trader="BEA" dtdRev="2.0" xid="03-JUL-17 14:38:49" traderLogin="middleware" traderPassword="abc123"/>
<Payload><SetProfile allowInvalidProfile="F"><Partner publisherID="7465737" act="Update">
<Contact languageCode="EN" firstName="John" lastName="Smith" email="Smith#email.com" act="Update" publisherID="JSmith" securityRoleCode="3"/></Partner></SetProfile></Payload></MyXML>
In short this will be a tool that will read a log file of XML messages and extract each individual XML message to be forwarded to a Web Service similar to the way the middleware is creating and sending each XML message today.
This will be used for volume testing and other development needs.
Any suggestions are appreciated.
A simple way to do it would be to load the log file content into a String and then use a regular expression, something like this:
Pattern p = Pattern.compile(Pattern.quote("<?xml version='1.0' encoding='UTF-8'?>") + ".*?" + Pattern.quote("</MyXML>"));
Matcher m = p.matcher(allText);
while (m.find()) {
System.out.println(m.group());
}

Using Aalto-xml forces to create new buffer for each XML

I'm trying to setup Aalto-xml (Woodstox) in a Async environment, but having troubles figuring out how to do it correctly!
My flow is:
Pre-allocate 'AsyncStreamReader'
xmlInputFactory.createAsyncForByteArray();
Receive XML stream from socket
Feed streamReader with data
streamReader.getInputFeeder().feedInput(bytes, 0, bufSize);
Parse the different XML tags in a loop:
while (streamReader.hasNext()) {...}
Upon Last XMLEvent.END_ELEMENT is parsed --> Invoke:
streamReader.getInputFeeder().endOfInput();
Upon XMLEvent.END_DOCUMENT do:
streamReader = xmlInputFactory.createAsyncForByteArray();
Upon AsyncXMLStreamReader.EVENT_INCOMPLETE --> Break the loop and wait for more data to feed the parser.
I do it like this although I know it's wrong because If I don't call:
'endOfInput()' then when feeding the reader with new buffer it throws an exception like I'm not done with the previous xml.
The point that fails me is that in order to get END_DOCUMENT you must call endOfInput() which closes the buffer so you cannot feed it with more input that comes from the socket...I'm stuck in a loop here !
How can I fix my flow?
Here is a Gist with the parser code:
https://gist.github.com/shvalb/ca9cd526aea31ccf280adf289e0991d7
Wait for end of stream;
* InputStream.read() to returns -1, or
* Stream is closed()) before calling endOfInput()
See this example, it also optionally stops feeding input if the EVENT_INCOMPLETE is not returned.

Parsing string to extract XML

Lets say I have the following String:
abc: def
qxy
<?xml version='1.0'><xyz>
...
</xyz>
other text
<?xml version='1.0'><www>
...
</www>
more text
Is there a way to parse this? I am currently trying with an XMLStreamReader and it throws a parsing error: javax.xml.stream.XMLStreamException: ParseError. If I remove all the test and just try to parse one of the XML sections (like only xyz) then it works perfectly.
You have to filter out the xml part. No general purpose XMLStreamReader will do it for you since they have no idea where your document starts or ends. You may craft your own specialized version that can filter the input, but other implementations expect a full xml document only.

Reading lots of data returned by API

I am reading from an API provided by a company, but the problem is that one of the accounts from which I am getting the data has around 22000 json objects, it reads fine with small amounts of data, i would say up to 8000 records, but then I get issues like the json is not well formatted besides the problem of being able to read the response.
The response comes this way:
<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://ywers.com">
[{"Name":"Edward", "LastName":"Jones", "Address":"{accepted}"}
,{"Name":"Carlos", "LastName":"Ramirez", "Address":"{Rejected}"}, ....... 22k more records here]</string>
I asked for some help earlier on here for the best way to do this, and i got a response about reading it using the xml parser and then a json parser, i am using GSON.
String XML = "<Your XML Response>";
XPathExpression xpath = XPathFactory.newInstance()
.newXPath().compile("/*[local-name()='string']");
String json = xpath.evaluate(new InputSource(new StringReader(XML)));
and then
JSONArray jsonRoot = new JSONArray(json.trim());
System.out.println(jsonRoot.getJSONObject(0).getString("Address")); // {accepted}
The problem with this is approach i am having is that it throws errors when reading the XML, it starts reading but after a while it stops with errors like:
java.lang.OutOfMemoryError
at java.lang.AbstractStringBuilder.enlargeBuffer(AbstractBuilder.java:94)
at java.lang.StringBuffer.append(StringBuffer.java:219)
at org.apache.harmony.xml.dom.CharacterDataImpl.appendData(CharacterDataImpl.java:43)
......
I would appreciate any advise on how to proceed with this, I am kind of new to android.
I don't know who would wrap 22k objects inside a xml string, but apparently someone is doing that. From my experience, your out of memory is because the you try to convert all the response to string but the response is too big to be handled. I recommend you to stream the JSON data. You can do stream the JSON data from the inputstream response that you get from the your HTTP post, but you need to skip the XML part by creating another input stream from the original response input stream and skip the XML part
Before I use the streaming API from google GSON I also got OOM error because the JSON data I got is very big data (many images and sounds in Base64 encoding) but with GSON streaming I can overcome that error because it reads the data per token not all at once. And for alternative you can also use Jackson JSON library I think it also have streaming API and how to use it almost same with my implementation with google GSON. I hope my answer can help you and if you have another question about my answer feel free to ask in the comment :)

Stream xml input to sax Parser, How to print the xml streamed?

Well I am trying to connect to one remote server via socket, and I get big xml responses back from socket, delimited by a '\n' character.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
.......
.......
</data>
</Response>\n <---- \n acts as delimiter
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<data>
....
....
</data>
</Response>\n
..
I am trying to parse these xml using SAX Parser. Ideally I want to get one full response to a string by searching for '\n' and give this response to parser. But since my single response is very large, I am getting outOfMemory Exception when holding such a large xml in string..So the only option remained was to stream the xml to SAX.
SAXParserFactory spfactory = SAXParserFactory.newInstance();
SAXParser saxParser = spfactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(new MyDefaultHandler(context));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(mySocket.getInputStream()));
xmlReader.parse(xmlInputSource);
I am using closeShieldInputStream to prevent SAX closing my socket stream on exception because of '\n'. I asked a previous question on that..
Now sometimes I am getting Parse Error
org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 8: not well-formed (invalid token)
I searched for it and found out this error normally comes when the encoding of actual xml is not same as what SAX expecting. I wrote a C program and print out the xml, and all my xml is encoded by UTF-8.
Now my question..
Is there any other reason for the above given error in xml parsing
other than encoding issue
Is there any way to print (or write to any file) the input to SAX as
it streams from socket?
After trying Hemal Pandya's answer..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt"));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
xmlReader.parse(xmlInputSource);
a new file with name log.txt getting created when I mount the SDCard but it is empty..Am I using this right?
Well Finally how I done it..
I worked it out with TeeInputStream itself..thanks Hemal Pandya for suggesting that..
//open a log file in append mode..
OutputStream log = new BufferedOutputStream(new FileOutputStream("log.txt",true));
InputSource xmlInputSource = new InputSource(new CloseShieldInputStream(new
TeeInputStream(mReadStream, log)));
try{
xmlReader.parse(xmlInputSource);
//flush content in the log stream to file..this code only executes if parsing completed successfully
log.flush();
}catch(SaxException e){
//we want to get the log even if parsing failed..So we are making sure we get the log in either case..
log.flush();
}
Is there any way to print (or write to any file) the input to SAX as
it streams from socket?
Apache Commons has a TeeInputStream that should be useful.
OutputStream log = new BufferedOutputStream(new FileOutputtStream("response.xml"));
InputSource xmlInputSource = new InputSource(new
CloseShieldInputStream(new TeeInputStream(mySocket.getInputStream(), log)));
I haven't used it, you might want to try it first in a standalone program to figure out close semantics, though looking at docs and your requirements it looks like you would want to close it separately at end.
I am not familiar with Expat, but to accomplish you are are describing in general, you need a SAX parser that supports pushing data into the parser instead of having the parser pull data from a source. Check if Expat supports a push model. If it does, then you can simply read a chunk of data from the socket, push it to the parser, and it will parse whatever it can from the chuck, caching any remaining data for use during the next push. Repeat as needed until you are ready to close the socket connection. In this model, the \n separator would get treated as miscellaneous whitespace between nodes, so you have to use the SAX events to detect when a new <Response> node opens and closes. Also, because you are receiving multiple <Response> nodes in the data, and XML does not allow more than 1 top-level document node, you would need to push a custom opening tag into the parser before you then start pushing the socket data into the parser. The custom opening tag will then become the top-level document node, and the <Response> nodes will be children of it.

Categories

Resources