How to convert javax.mail.BodyPart content to UTF-8 clean text?

How to convert javax.mail.BodyPart content to UTF-8 clean text? - java

This is what I'm getting with javax.mail.BodyPart.writeTo(..):
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
some text *again*
=97
Bobby
On Wed, Feb 8, 2012 at 11:51 AM, Alex Johnson <alex#example.com> wrot=
e:
> let's try again
> and again
I want to clean this text and convert it to UTF-8, in order to receive exactly this:
some text *again*
--
Bobby
I'm sure I'm not the first who is facing this problem/task. Do you know any Java libraries that can help?

Removing the included message is just a string manipulation problem, I'm sure you can figure that out yourself using regular expressions or whatever.
You can access the content of the body part, without the headers, and with the charset
encoding handled for you, simply by calling the getContent method.
Is that what you're looking for?

Related

french accents giving "<?>" in http responses with correct charset (Java)

Calling an API that returns french sentences, all the accented characters are displayed like <?> in my java code, even if the charset is well defined (application/json;charset=iso-8859-1).
Using postman or my web browser, I don't face any problem.
I also tried to call the API with a Content-Type header with the value application/json;charset=UTF-8 or application/json;charset=iso-8859-1 but the problem remains the same.
Any idea ?
response.getBody() gives:
{"sentences":[{"fr_value":"il �tait loin","dz_value":"kaan b3id","additional_information":{"personal_prounoun":"HE","verb":"�tre","adjective":"loin","tense":"pass�"}}],"count":1}
new String(response.getBody().getBytes(StandardCharsets.UTF_8)) gives exactly the same.
I'm using scribejava.
Edit: even saving the response in a file and opening it with NotePad++, the result is similar:

You need to read it as ISO-8859-1. Not sure what then as I don't know what you're doing. My https://technojeeves.com/index.php/aliasjava1/51-transcode-in-java is helpful. With wget:
wget -O - us-central1-dz-dialect-api.cl… | xcode -ie Latin1 (I made 'xcode' to invoke that Java app)

Problem solved using the following code :
httpResponse.setContentType("application/json;charset=UTF-8");
mapper.getFactory().configure(JsonGenerator.Feature.ESCAPE_NON_ASCII, true);

MIME-encoded file creation in Java

Just for learning purpose I want to create MIME type file. Is there any java api is available to create file? I want to embed pkcs7 response in to this file.
Content-Type: multipart/mixed; boundary="=_1_1440399813_23993"
Content-Transfer-Encoding: 8bit
This is a MIME-formatted message. If you see this text it means that your
E-mail software does not support MIME-formatted messages.
--=_1_1440399813_23993
Content-Disposition: attachment; filename=hello.pdf
Content-Type: application/pdf
Content-Transfer-Encoding: base64
MMDAwMCBuIAowMDAwMDA1MzUwIDAwMDAwIG4g
CnRyYWlsZXIKPDwvU2l6ZSAxNC9Sb290IDEyIDAgUgovSW5mbyAxMyAwIFIKL0lEIFsgPDZCQjND
QzhCOTkyNDMxMkIxMEZDOUI1MjM3NEQzODYyPgo8NkJCM0NDOEI5OTI0MzEyQjEwRkM5QjUyMzc0
RDM4NjI+IF0KL0RvY0NoZWNrc3VtIC9FNUJGNDQ1NENFRUY4RDk0RjNGRDM4NUU5NTFEOERBNAo+
PgpzdGFydHhyZWYKNTUyNQolJUVPRgo=
--=_1_1440399813_23993
Content-Disposition: attachment; filename=hello.pdf.p7s
Content-Type: application/pkcs7-mime
Content-Transfer-Encoding: base64
Lxbtj3FFg7Ss7fxsq/pSBHYBTLlDZ9K1
mmMmCygSi6+JMfbqj/vu50L79qtq3TgfMXh8GIOWqpiIh6c4/U29T6yZIBFktqiS1vRWiONM9OCx
WlrP70y9XePc/L+HyBeuUjFNCnSsxVO1atgYgLI4GdXu2jc1wTonuQVmNbsTtQ2r27SiM9eTkJYW
jrURoe5lBMTQ6rxGZWdB8bnQXjvQhmOqoDmXJ161MZ/jxbTEwHclDuKggOG29nEfZe/ybq25Fyu8
k01tvUXJJ1EanVZQwFVaoqOmR6W1/HGSxB3S7A==
--=_1_1440399813_23993--
Is there any api availabe to create this type of file? Can i give any extension for this file? I am very new in this area.

You should be able to use the Java Mail API for this. (http://www.oracle.com/technetwork/java/javamail/index-141777.html)
But actually my experience is that this API is no fun to use and as long as you just want to encode several files to one MIME message it might be easier to encode it yourself.
Some frameworks like Spring (org.springframework.mail.*) build wrappers around Java Mail, that make it easier to use.

To the best of my knowledge there is no ready API for this.
Keep in mind that MIME multipart encoding is just plain text. So you can format this in the regular StringBuilder.
The only somewhat difficult part may be base64 encoding, but for that there is a ready class: Base64.
You could easily have a function, which takes an array of array of bytes and returns the formatted string, or you could do a class to which you keep addData-ing and then finally get the result.
It really depends upon your design what you want to do.

Freemarker - lack of special character in email subject template cause email content crash

im fighting with strange error. Im using seperate freemarker templates for mail subject and body. It is sent using org.springframework.mail.javamail.JavaMailSender. Only templates that contains some special swedish character works in my application ( yes you read right... not the other way). If I delete it my email content crashes. It contains then:
MIME-Version: 1.0
Content-Type: text/html;charset=UTF-8
Content-Transfer-Encoding: 7bit
..
html code here
..
My freemarker.properties file
locale=sv_SE
classic_compatible=false
number_format=
date_format=yyyy-MM-dd
time_format=HH:mm
datetime_format=yyyy-MM-dd HH:mm
output_encoding=UTF-8
url_escaping_charset=UTF-8
auto_import=spring.ftl as spring
auto_include=
default_encoding=UTF-8
localized_lookup=true
strict_syntax=true
whitespace_stripping=true
template_update_delay=10
Ive tried to convert subject file with dos2unix tool. Using 'find -bi subject.ftl' show that encoding is us-ascii. With added special character - utf-8.
This thing is suprisingly strange for me...
//SOLUTION:
use :set bomb and save file in vim.

As per request of #freakman, I post my comment as an answer - as it turned out to solve the problem:
Append the BOM (Byte Order Mark) to the beginning of the template file, to ensure it is always parsed as an UTF-8 template.
For reference, the BOM for UTF-8 is:
Hexadecimal: EF BB BF
ISO-8859-1 encoded ï»¿
Java: byte[] bom = { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF};

GZip a string for output from Coldfusion results in "Content Encoding Error" in browsers

I am trying to GZip content in a variable to output to the browser. To start I am making this very simple and not worrying about browsers that do not support gzip. Also I have put this together from researching several methods that I could find on the web. Some of them from people that may be reading this question.
<cfsavecontent variable="toGZIP"><html><head><title>Test</title></head><body><h1>Fear my test</h1></body></html></cfsavecontent>
<cfscript>
ioOutput = CreateObject("java","java.io.ByteArrayOutputStream");
gzOutput = CreateObject("java","java.util.zip.GZIPOutputStream");
ioOutput.init();
gzOutput.init(ioOutput);
gzOutput.write(toGZIP.getBytes("UTF-8"), 0, Len(toGZIP.getBytes()));
gzOutput.finish();
gzOutput.close();
ioOutput.flush();
ioOutput.close();
toOutput=ioOutput.toString("UTF-8");
</cfscript>
<cfcontent reset="yes" /><cfheader name="Content-Encoding" value="gzip"><cfheader name="Content-Length" value="#ArrayLen( toOuptut.getBytes() )#" ><cfoutput>#toOuptut#</cfoutput><cfabort />
But I get an error in Firefox (and chrome and Safari)
Content Encoding Error
The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression.
Anybody have any ideas?
OS: Mac OX-X Snow Leopard
CF: 9-Dev
Webserver: Apache
SOLUTION
<cfsavecontent variable="toGZIP"><html><head><title>Test</title></head><body><h1>Fear my test</h1></body></html></cfsavecontent>
<cfscript>
ioOutput = CreateObject("java","java.io.ByteArrayOutputStream");
gzOutput = CreateObject("java","java.util.zip.GZIPOutputStream");
ioOutput.init();
gzOutput.init(ioOutput);
gzOutput.write(toGZIP.getBytes(), 0, Len(toGZIP.getBytes()));
gzOutput.finish();
gzOutput.close();
ioOutput.flush();
ioOutput.close();
toOutput=ioOutput.toByteArray();
</cfscript>
<cfheader name="Content-Encoding" value="gzip"><cfheader name="Content-Length" value="#ArrayLen(toOutput)#" ><cfcontent reset="yes" variable="#toOutput#" /><cfabort />

The follow line look completely wrong:
toOutput=ioOutput.toString("UTF-8");
You encode the GZip stream with UTF8. The result are garbage data. The best you set the GZip data as binary if ColdFusion has the option. If you can only set a string then you need an encoding that not change any bytes. For example iso1.

Is there a reason you're doing it manually over letting the web server (IIS or Apache) handle this? Both of them support GZip encoding, and will probably do so faster and better than your manual process.
Enabling GZip in IIS6
Enabling GZip in IIS7
Enabling GZip in Apache2

Please note that you have syntax error in the code: toOuptut instead of toOutput.
Unfortunately, I'm not a Java expert and can't say what exactly is wrong. But when I try to save the contents into the file using wget, it contains not zipped binary, but source HTML. It can mean that gzOutput-related processing does not produce correct output.
BTW, verifying the browser support of GZip is pretty simple. You can check the Accept-Encoding header, like this:
<cfif FindNoCase("gzip", cgi.HTTP_ACCEPT_ENCODING)>
<!--- prepare the gzipped text --->
</cfif>

How do you embed binary data in XML?

I have two applications written in Java that communicate with each other using XML messages over the network. I'm using a SAX parser at the receiving end to get the data back out of the messages. One of the requirements is to embed binary data in an XML message, but SAX doesn't like this. Does anyone know how to do this?
UPDATE: I got this working with the Base64 class from the apache commons codec library, in case anyone else is trying something similar.

You could encode the binary data using base64 and put it into a Base64 element; the below article is a pretty good one on the subject.
Handling Binary Data in XML Documents

XML is so versatile...
<DATA>
<BINARY>
<BIT index="0">0</BIT>
<BIT index="1">0</BIT>
<BIT index="2">1</BIT>
...
<BIT index="n">1</BIT>
</BINARY>
</DATA>
XML is like violence - If it doesn't solve your problem, you're not using enough of it.
EDIT:
BTW: Base64 + CDATA is probably the best solution
(EDIT2:
Whoever upmods me, please also upmod the real answer. We don't want any poor soul to come here and actually implement my method because it was the highest ranked on SO, right?)

Base64 is indeed the right answer but CDATA is not, that's basically saying: "this could be anything", however it must not be just anything, it has to be Base64 encoded binary data. XML Schema defines Base 64 binary as a primitive datatype which you can use in your xsd.

I had this problem just last week. I had to serialize a PDF file and send it, inside an XML file, to a server.
If you're using .NET, you can convert a binary file directly to a base64 string and stick it inside an XML element.
string base64 = Convert.ToBase64String(File.ReadAllBytes(fileName));
Or, there is a method built right into the XmlWriter object. In my particular case, I had to include Microsoft's datatype namespace:
StringBuilder sb = new StringBuilder();
System.Xml.XmlWriter xw = XmlWriter.Create(sb);
xw.WriteStartElement("doc");
xw.WriteStartElement("serialized_binary");
xw.WriteAttributeString("types", "dt", "urn:schemas-microsoft-com:datatypes", "bin.base64");
byte[] b = File.ReadAllBytes(fileName);
xw.WriteBase64(b, 0, b.Length);
xw.WriteEndElement();
xw.WriteEndElement();
string abc = sb.ToString();
The string abc looks something that looks like this:
<?xml version="1.0" encoding="utf-16"?>
<doc>
<serialized_binary types:dt="bin.base64" xmlns:types="urn:schemas-microsoft-com:datatypes">
JVBERi0xLjMKJaqrrK0KNCAwIG9iago8PCAvVHlwZSAvSW5mbw...(plus lots more)
</serialized_binary>
</doc>

I usually encode the binary data with MIME Base64 or URL encoding.

Try Base64 encoding/decoding your binary data. Also look into CDATA sections

Any binary-to-text encoding will do the trick. I use something like that
<data encoding="yEnc>
<![CDATA[ encoded binary data ]]>
</data>

Maybe encode them into a known set - something like base 64 is a popular choice.

Base64 overhead is 33%.
BaseXML for XML1.0 overhead is only 20%. But it's not a standard and only have a C implementation yet. Check it out if you're concerned with data size. Note that however browsers tends to implement compression so that it is less needed.
I developed it after the discussion in this thread: Encoding binary data within XML : alternatives to base64.

While the other answers are mostly fine, you could try another, more space-efficient, encoding method like yEnc. (yEnc wikipedia link) With yEnc also get checksum capability right "out of the box". Read and links below. Of course, because XML does not have a native yEnc type your XML schema should be updated to properly describe the encoded node.
Why: Due to the encoding strategies base64/63, uuencode et al. encodings increase the amount of data (overhead) you need to store and transfer by roughly 40% (vs. yEnc's 1-2%). Depending on what you're encoding, 40% overhead could be/become an issue.
yEnc - Wikipedia abstract:
https://en.wikipedia.org/wiki/YEnc
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. ... An additional advantage of yEnc over previous encoding methods, such as uuencode and Base64, is the inclusion of a CRC checksum to verify that the decoded file has been delivered intact.
‎

You can also Uuencode you original binary data. This format is a bit older but it does the same thing as base63 encoding.

If you have control over the XML format, you should turn the problem inside out. Rather than attaching the binary XML you should think about how to enclose a document that has multiple parts, one of which contains XML.
The traditional solution to this is an archive (e.g. tar). But if you want to keep your enclosing document in a text-based format or if you don't have access to an file archiving library, there is also a standardized scheme that is used heavily in email and HTTP which is multipart/* MIME with Content-Transfer-Encoding: binary.
For example if your servers communicate through HTTP and you want to send a multipart document, the primary being an XML document which refers to a binary data, the HTTP communication might look something like this:
POST / HTTP/1.1
Content-Type: multipart/related; boundary="qd43hdi34udh34id344"
... other headers elided ...
--qd43hdi34udh34id344
Content-Type: application/xml
<myxml>
<data href="cid:data.bin"/>
</myxml>
--qd43hdi34udh34id344
Content-Id: <data.bin>
Content-type: application/octet-stream
Content-Transfer-Encoding: binary
... binary data ...
--qd43hdi34udh34id344--
As in above example, the XML refer to the binary data in the enclosing multipart by using a cid URI scheme which is an identifier to the Content-Id header. The overhead of this scheme would be just the MIME header. A similar scheme can also be used for HTTP response. Of course in HTTP protocol, you also have the option of sending a multipart document into separate request/response.
If you want to avoid wrapping your data in a multipart is to use data URI:
<myxml>
<data href="data:application/something;charset=utf-8;base64,dGVzdGRhdGE="/>
</myxml>
But this has the base64 overhead.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.