I get this error while parsing an xml file using JDOM.
What is happening is, I receive a stream of data which is an xml combined with a pdf as an attachment within it. So when I try to create a document of it, this error is thrown.
I tried to print this stream and on the console I get the following, It is with lot of junk chars(the pdf contents) but in Wordpad it looks like -
------=_Part_2_23286828.1296553488632
Content-Type: text/xml; charset=utf-8
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
....
....
....
<Attachment>
<URI>Filename.pdf</URI>
</Attachment>
</SOAP-ENV:Envelope>
------=_Part_2_23286828.1296553488632
Content-Type: application/pdf; name="Filename.pdf"
Content-Transfer-Encoding: binary
Content-ID: </Attachment[1]/URI[1]>
Content-Disposition: attachment; filename="Filename.pdf"
%PDF-1.4
%âãÏÓ
4 0 obj <</Type/XObject/ColorSpace/DeviceRGB/Subtype/Image/BitsPerComponent 8/Width 579/Length 52722/Height 480/Filter/DCTDecode>>stream
ÿØÿà
Please note that the xml between <SOAP-ENV:Envelope> and </SOAP-ENV:Envelope> is well-formed.
How could I go about and create a JDOM document out of it? I guess, by removing the content before and after the xml start/end tags but how in a clean way?
I read that BOMInputStream from Apache IO Commons is helpful but I believe it is in version 2.* and I am using version 1.3.1
I hope this explains my problem, if not pls let me know.
Thank you.
UPDATE
At first I didnt realize it would be this cumbersome.
Actually, I am making a call from one servlet to another(doPost) using HttpURLConnection. The return is in the form of this stream.
Now, I am also trying to explore if in any way I can extract the xml part using some of the methods provided by Http/URLConnection.
Appreciate if anyone could shed some more light on this.
This message conforms to the SOAP with Attachment specification (http://www.w3.org/TR/SOAP-attachments). In java the way to parse these messages is to use an implementation of the SAAJ (Soap with Attachments API for Java: http://download.oracle.com/javaee/5/tutorial/doc/bnbhf.html.) There are a couple of different implementations of SAAJ out there. My personal favorite is the Spring-WS implementation another option is Apache Axiom.
My suggestion to you would be use either Spring-WS or Apache Axis to process this message rather than trying to do it manually from an input stream. Are you trying to do this on the server side or on the client side?
Related
The previous ConceptNet 5.4 API version returns plain text JSON format (http://conceptnet5.media.mit.edu/data/5.4/c/en/library).
Does someone familiar with the new ConceptNet 5.5 API? Why the query returns html source and not plain text like previous version (http://api.conceptnet.io/related/c/en/library)? I didn't find option to select the plain text JSON format.
Therefore, this Java code is not usable with the new version.
JsonReader jsonReader = Json.createReader(new URL("http://conceptnet5.media.mit.edu/data/5.4/c/en/library").openStream());
The format of the API response depends on the Accept: header that your client sends. (I found this behavior really convenient when Django REST Framework does it, so I implemented it in ConceptNet's API code.) The default response format is JSON.
If you run curl http://api.conceptnet.io/related/c/en/library at the command line, for example, you will see plain, un-indented JSON. If you go there in a Web browser, it's indented and wrapped in HTML so it can be syntax-highlighted and linked. The difference is that the Web browser sends the header Accept: text/html.
I think it's a bug in the JsonReader you're using that it's explicitly sending Accept: text/html and yet it's expecting a JSON response, not HTML. See if you can work around it by configuring the headers it sends.
(If getting the HTML is unavoidable, note that the plain JSON is also present in the HTML, within the <script type="application/ld+json"> tag.)
Is any special handling required to parse the eps attachment in the eml using javamail
I dont have any issue other types of attachments. only eps attachment gives problem
When i send the eps attachment using outlook 2010 it converts encoding from base64 to quoted-printable.
Below is the header of the eps attachment i am trying to parse.
Content-Type: application/postscript;
name="LOGO.eps"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="LOGO.eps"
The attachment is extracted. But when i open it with document viewer it says loading forever
JavaMail tries to guess the best Content-Transfer-Encoding for the data by looking at the actual data. If JavaMail guesses that the data is text, it's likely to use quoted-printable. In some cases, the guess can be wrong, in which case you'll need to specify the encoding yourself. If you're using JavaMail 1.5, there's a new attachFile method that makes this easier.
I have been investigating how to handle binary data with SOAP messages. I am developing both the client and the service so i have the option of choosing whichever framework. The only constraint is that the Service end has already been designed and is based on Spring-WS.
Looking at google, it looks like there are three options:
Sending the attachment inline as base64 in the SOAP message (Base64Binary).
Sending the attachment outside of the SOAP message. i.e. with a reference to the attachement (SWA)
Sending the attachement outside the message but make it appear as though it is embedded in the message (MTOM).
Questions
What exactly does it mean when they say that the attachement is outside of the SOAP message? I assume that maybe the attachement is sent as a different TCP package but i think i am wrong?
Which of the above options is recommended and specifically, which one works best with Spring's Spring-WS framework?
It is unclear to me which of the above options encode the binary content during transmission. What is Binary MIME as described here - http://www.crosschecknet.com/intro_to_mtom.php ? Is the binary data still converted to text during transmission?
What is the format of the data when using SWA?
What exactly does it mean when they say that the attachement is outside of the SOAP message? I assume that maybe the attachement is sent as a different TCP package but i think i am wrong?
In contrast to the first option the attachment is not part of the actual soap message payload but instead referenced to within the SOAP document. The difference between MTOM and SWA is where the referenced file is located. For MTOM it's embedded in the response, while in SWA you could for example get a link to the resource on the web. It follows 3 minimal examples:
MTOM (all in one response of type xop+xml)
Content-type: multipart/related;
type="application/xop+xml";
start-info="text/xml"
--uuid:c73c9ce8-6e02-40ce-9f68-064e18843428
Content-Type: application/xop+xml;charset=utf-8;type="text/xml"
Content-Transfer-Encoding: binary
<?xml version="1.0" ?>
<S:Envelope xmlns:S="...">
<S:Body>
<ns2:downloadImageResponse xmlns:ns2="...">
<return>
<xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include"
href="cid:012eb00e-9460-407c-b622-1be987fdb2cf#example.jaxws.sun.com">
</xop:Include>
</return>
</ns2:downloadImageResponse>
</S:Body>
</S:Envelope>
--uuid:c73c9ce8-6e02-40ce-9f68-064e18843428
Content-Id: <012eb00e-9460-407c-b622-1be987fdb2cf#example.jaxws.sun.com>
Content-Type: image/png
Content-Transfer-Encoding: binary
SWA (only reference)
Content-Type: application/xml;charset=utf-8;
<?xml version="1.0" ?>
<S:Envelope xmlns:S="...">
<S:Body>
<ns2:downloadImageResponse xmlns:ns2="...">
<return>
https://server.com/downloadImagehere.png
</return>
</ns2:downloadImageResponse>
</S:Body>
</S:Envelope>
Inline
Content-Type: application/xml;charset=utf-8;
<?xml version="1.0" ?>
<S:Envelope xmlns:S="...">
<S:Body>
<ns2:downloadImageResponse xmlns:ns2="...">
<return>
YTM0NZomIz...potentiallyLargeBase64encodedFileGoesInHere...I2OTsmIzM0NTueYQ==
</return>
</ns2:downloadImageResponse>
</S:Body>
</S:Envelope>
Which of the above options is recommended and specifically, which one works best with Spring's Spring-WS framework?
They are all supported, and the one to use depends a bit on your use-case. MTOM seems to be the de-facto standard according to my research. According to me it it's particularly useful if you have either large or multiple file attachements. Since it splits the message in it's logical components it might give the parser more options to handle the binary data efficiently.
However, for smaller data I would probably go for the embedding of the resource, since it's part of the standard SOAP protocol and only uses an encoded byte array directly which is then directly embedded into the message. If portability/compatibility is important this might be the approach to choose.
The last approach obviously requires you to handle the reference by yourself, which might or might not be what you want.
It is unclear to me which of the above options encode the binary content during transmission. What is Binary MIME as described here - http://www.crosschecknet.com/intro_to_mtom.php ? Is the binary data still converted to text during transmission?
MTOM and Inline both encode the file usually as Base64encoded String. For external links it's not relevant.
What is the format of the data when using SWA?
Base64encoded byte array
I want to return some HTML code in my XML response. So it will be highlighted on the client side. so for example i have XML response like this
<Response id="1234"> <Document> <text><span style="color:blue;font:18pt" >fevers</span gt;</text></Document>
</Response>
The problem that the client understands this as "<span>fever</span>" but not understanding the span itself and highlighting the word "fever". and of course i can't send "<>" in XML as the library translates them directly to their codes. Any help would be appreciated Thank you
Enclose it with Character Data (CDATA):
<![CDATA[<hello>World!</hello>]]>
Base 64 encode it and you'll be able to send that XML - without having to deal with a CDATA section on the client side.
Personally, I don't think it's a good idea for servers to know anything about how data is rendered on the client side. Why should the server enforce what amounts to CSS style? That's a client issue.
If it must be enforced, make it a governance or communication issue; don't try to do it in code.
Im working with a server which always sends me XML responds. But sometimes when server is lack or something it reports me about it by sending me back a HTML page (it just a html page informing about the error) but i didn't expect that and my XML parser crashed.
Im using DefaultHttpClient() and I do send header like mHttpRequest.setHeader("Accept", "text/xml");
So what is the proper way to ensure i got XML (or other specific format) response?
As Kristian suggested, see if it provides a different Content-Type when HTML is emitted. Failing that I would check for a <?xml... line, as apposed to a doctype or whatever is on the HTML page.
Can you check the content type header on the response?
Something like (if I understand the Android documentation correctly):
"text/xml".equals(httpResponse.getEntity().getContentType().getValue());
Every XML contains document descriptor <?xml version="1.0" encoding="utf-8"?> in the beginning. You can check for this before parsing it.
However, I think you need some error handling and result validation in your parser. Network is not a safe environment - you can easily get a half-broken, malformed, or even forged XML from the network. Good parser should detect that and report corresponding errors, not just crash.