MIME-encoded file creation in Java

MIME-encoded file creation in Java - java

Just for learning purpose I want to create MIME type file. Is there any java api is available to create file? I want to embed pkcs7 response in to this file.
Content-Type: multipart/mixed; boundary="=_1_1440399813_23993"
Content-Transfer-Encoding: 8bit
This is a MIME-formatted message. If you see this text it means that your
E-mail software does not support MIME-formatted messages.
--=_1_1440399813_23993
Content-Disposition: attachment; filename=hello.pdf
Content-Type: application/pdf
Content-Transfer-Encoding: base64
MMDAwMCBuIAowMDAwMDA1MzUwIDAwMDAwIG4g
CnRyYWlsZXIKPDwvU2l6ZSAxNC9Sb290IDEyIDAgUgovSW5mbyAxMyAwIFIKL0lEIFsgPDZCQjND
QzhCOTkyNDMxMkIxMEZDOUI1MjM3NEQzODYyPgo8NkJCM0NDOEI5OTI0MzEyQjEwRkM5QjUyMzc0
RDM4NjI+IF0KL0RvY0NoZWNrc3VtIC9FNUJGNDQ1NENFRUY4RDk0RjNGRDM4NUU5NTFEOERBNAo+
PgpzdGFydHhyZWYKNTUyNQolJUVPRgo=
--=_1_1440399813_23993
Content-Disposition: attachment; filename=hello.pdf.p7s
Content-Type: application/pkcs7-mime
Content-Transfer-Encoding: base64
Lxbtj3FFg7Ss7fxsq/pSBHYBTLlDZ9K1
mmMmCygSi6+JMfbqj/vu50L79qtq3TgfMXh8GIOWqpiIh6c4/U29T6yZIBFktqiS1vRWiONM9OCx
WlrP70y9XePc/L+HyBeuUjFNCnSsxVO1atgYgLI4GdXu2jc1wTonuQVmNbsTtQ2r27SiM9eTkJYW
jrURoe5lBMTQ6rxGZWdB8bnQXjvQhmOqoDmXJ161MZ/jxbTEwHclDuKggOG29nEfZe/ybq25Fyu8
k01tvUXJJ1EanVZQwFVaoqOmR6W1/HGSxB3S7A==
--=_1_1440399813_23993--
Is there any api availabe to create this type of file? Can i give any extension for this file? I am very new in this area.

You should be able to use the Java Mail API for this. (http://www.oracle.com/technetwork/java/javamail/index-141777.html)
But actually my experience is that this API is no fun to use and as long as you just want to encode several files to one MIME message it might be easier to encode it yourself.
Some frameworks like Spring (org.springframework.mail.*) build wrappers around Java Mail, that make it easier to use.

To the best of my knowledge there is no ready API for this.
Keep in mind that MIME multipart encoding is just plain text. So you can format this in the regular StringBuilder.
The only somewhat difficult part may be base64 encoding, but for that there is a ready class: Base64.
You could easily have a function, which takes an array of array of bytes and returns the formatted string, or you could do a class to which you keep addData-ing and then finally get the result.
It really depends upon your design what you want to do.

Related

"?" gets dropped in Mail Header for attachment Filename (UTF-8) Exchange server

We are using JavaMail to send mail with PDF attachments. When Unicode characters are present in the filename, the attachments seem to be named as the UTF encoded name. Upon further inspection of the mail headers found that the ? in the filename MIME is dropped. For example
Expected:
Content-Disposition: attachment;
filename="=?utf8?Q?hinzugef=C3=BCgte.pdf?="
Obtained:
Content-Disposition: attachment;
filename="=utf8Qhinzugef=C3=BCgte.pdf="
And because of this the Filename in the attachment is =utf8Qhinzugef=C3=BCgte.pdf= and we are unable to open it.
If I manually modify the .eml file and add the ? in the right places and open it in outlook, the file is displayed in PDF format as expected.
This issue has been reported in Exchange server and we are unable to reproduce it in Gmail or Fake SMTP (on my machine, used to test mail)
Sample code:
MimeBodyPart mbp2 = new MimeBodyPart();
String attFileName = file.getName();
String i18nFileName = new String(attFileName.getBytes(), "UTF-8");
String mimeType = mimeMap.getContentType(attFileName);
attStream = new FileInputStream(att);
ByteArrayDataSource bas = new ByteArrayDataSource(attStream, mimeType);
mbp2.setDataHandler(new DataHandler(bas));
mbp2.setFileName(MimeUtility.encodeText(i18nFileName));
mp.addBodyPart(mbp2);
if (attStream != null) {
attStream.close();
}
Why does this happen? Any leads would be very helpful

This is wrong encoded to begin with.
What you implemented was RFC 2047, but that doesn't apply to HTTP at all.
RFC 6266 § 4.3 explains how to deal with the filename= parameter for that HTTP header and then refers to
RFC 5897, obsoleted by RFC 8187 § 3.2.3 on how to incorporate non-ASCII.
The generic form is filename*=UTF-8''Na%C3%AFve%20file.txt and it differs in several aspects from RFC 2047 which you implemented:
filename*= should be used - note the trailing asterisk at the parameter. This is to signal extended notation - otherwise neither a charset nor percent encoding is expected.
Enclosing the value in "quotation marks" is neither needed, nor allowed when using extended notation.
Likewise the prefix =?, the suffix ?=, and the ?Q? encoding parameter are ever expected. Logically they also make no sense, as only quoted encoding is available and the whole non-ASCII scope is entirely, not just somewhere.
The '' part is for the optional language code - it could be 'en' for English, but effectively nobody cares about that.
The rest is trivial: each byte of a UTF-8 character sequence is quoted encoded. A space must be quote encoded, too (speak: %20).
The correct charset is UTF-8, while utf8 is wrong - don't rely on being accepted with that unofficial alias although it is tolerated every now and then.
In other words: the client acted correctly. If I use Thunderbird 68 and either hit CTRL+Q to see an e-mail's source, or save an e-mail as an .EML file and then look into that file, I have a multipart where each attachment has the headers
Content-Disposition: inline;
filename*=utf-8''L%20%2D%20qualita%CC%88t.pdf
Content-Type: application/pdf;
x-unix-mode=0644;
name="=?utf-8?Q?L_-_qualita=CC=88t=2Epdf?="
Don't get confused because you now see both variants - they still have different purposes and different contexts. What you want is primarily the filename (although it can't hurt to also provide a name). If you look closely the values also differ (former has spaces, latter uses underscores - but that was the sender's free decision). The UTF-8 character sequence %CC%88 or =CC=88 is the codepoint U+0308 = ̈ COMBINING DIAERESIS (making the a before an ä).
This answer explains how differently HTTP browsers treated RFC 5897 in the year 2011.

Displaying image from byte array in browser

I have a simple TCP serversocket that will GET a byte array. This GET comes from when entering a website on this server that contains an img src link to a gif image, the requests looks like this:
GET /myHome.htm HTTP/1.1
GET /house.gif HTTP/1.1
Now the byte array is done like this:
byte[] fileByte = Files.readAllBytes(filePath);
To print the website which contains this image I do this:
out.writeBytes(new String(fileByte));
out:
DataOutputStream out= new DataOutputStream(socketClient.getOutputStream());
Now to make the image display I think I have to use something else then
out.writeBytes()
but I do not know for sure. Anybody knows how to make the image display? Right now the image just dont show at all.

First, make sure your GIF file is not corrupted. (Happened to me before, too).
If that is the case, try this code for sending the GIF file:
byte[] fileByte = Files.readAllBytes(filePath);
writer.writeBytes("HTTP/1.1 200 OK\r\n");
writer.writeBytes("Content-Type: image/gif\r\n");
writer.writeBytes("Content-Length: "+fileByte.length+"\r\n");
writer.writeBytes("\r\n");
writer.write(fileByte, 0, fileByte.length);
And then try to navigate to "house.gif" directly instead of "myHome.htm". Let me know in the comments what this does.
Previous answer attempts:
I think I may have misunderstood your question. Let me try with a different answer:
You are not sure how to figure out on the server when to return the HTML file myHome.htm and when to return house.gif?
I think for this you need to simply parse out the requested URL. Just check whether it contains "house.gif" or not. Then, depending on this, you either return the HTML file as you described above, or you send the .gif file, making sure that you use
writer.write(fileByte, 0, fileByte.length);
to send the binary data and that you set a reply header of
Content-Type: image/gif
In both cases (for the HTML file and the GIF file), though, you should prepend the data you are sending with correct HTTP response headers. Don't take the page-title the wrong way, but this site might help: http://net.tutsplus.com/tutorials/other/http-headers-for-dummies/
And just to make sure: Your server will be receiving TWO independent requests. The first one will ask for the HTML file, the second one will ask for the GIF file. So you send either one or the other. So, there's no "special way" to send the GIF instead of the HTML file. You use the same clientSocket. But it's a different connection.
Previous answer(s):
I think you might be missing the mime-type of your returned data. Try adding the following HTTP header to your reply:
Content-Type: image/gif
Actually... Are you sending a correct HTTP reply at all (including headers, specifically Content-Length)? If not, shoot me a comment and I'll post the code that you need for this.
If, for some reason, you cannot set the content-type header to let the browser know that you are sending it an image, you might be able to load the binary data on the client with an XMLHttpRequest into a JavaScript function rather than specifying it as the source Url of an img tag. Then you can use JavaScript to encode the binary data into a dataURI (http://en.wikipedia.org/wiki/Data_URI_scheme) with the correct mime type and set that as the source of the image.
Actually, I just noticed something in your code:
new String(fileByte)
might interpret the fileBytes as unicode characters rather than binary. Then, when you write this to the writer, it might screw it up as probably not all data in the image are valid unicode. Try replacing the line with this:
writer.write(fileByte, 0, fileByte.length);
Maybe this is all you need to do to fix it???

How to convert javax.mail.BodyPart content to UTF-8 clean text?

This is what I'm getting with javax.mail.BodyPart.writeTo(..):
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
some text *again*
=97
Bobby
On Wed, Feb 8, 2012 at 11:51 AM, Alex Johnson <alex#example.com> wrot=
e:
> let's try again
> and again
I want to clean this text and convert it to UTF-8, in order to receive exactly this:
some text *again*
--
Bobby
I'm sure I'm not the first who is facing this problem/task. Do you know any Java libraries that can help?

Removing the included message is just a string manipulation problem, I'm sure you can figure that out yourself using regular expressions or whatever.
You can access the content of the body part, without the headers, and with the charset
encoding handled for you, simply by calling the getContent method.
Is that what you're looking for?

How to check encoding in java?

I am facing a problem about encoding.
For example, I have a message in XML, whose format encoding is "UTF-8".
<message>
<product_name>apple</product_name>
<price>1.3</price>
<product_name>orange</product_name>
<price>1.2</price>
.......
</message>
Now, this message is supporting multiple languages:
Traditional Chinese (big5),
Simple Chinese (gb),
English (utf-8)
And it will only change the encoding in specific fields.
For example (Traditional Chinese),
蘋果
1.3
橙
1.2
.......
Only "蘋果" and "橙" are using big5, "<product_name>" and "</product_name>" are still using utf-8.
<price>1.3</price> and <price>1.2</price> are using utf-8.
How do I know which word is using different encoding?

It looks like whoever is providing the XML is providing incorrect XML. They should be using a consistent encoding.
http://sourceforge.net/projects/jchardet/files/ is a pretty good heuristic charset detector.
It's a port of the one used in Firefox to detect the encoding of pages that are missing a charset in content-type or a BOM.
You could use that to try and figure out the encoding for substrings in a malformed XML file if you can't get the provider to fix their output.

you should use only one encoding in one xml file. there are counterparts of the characters of big5 in the UTF_8 encoding.

Because I cannot get the provider to fix the output, so I should be handle it by myself and I cannot use the extend library in this project.
I only can solve that like this,
String str = new String(big5String.getByte("UTF-8"));
before display the message.

How do you embed binary data in XML?

I have two applications written in Java that communicate with each other using XML messages over the network. I'm using a SAX parser at the receiving end to get the data back out of the messages. One of the requirements is to embed binary data in an XML message, but SAX doesn't like this. Does anyone know how to do this?
UPDATE: I got this working with the Base64 class from the apache commons codec library, in case anyone else is trying something similar.

You could encode the binary data using base64 and put it into a Base64 element; the below article is a pretty good one on the subject.
Handling Binary Data in XML Documents

XML is so versatile...
<DATA>
<BINARY>
<BIT index="0">0</BIT>
<BIT index="1">0</BIT>
<BIT index="2">1</BIT>
...
<BIT index="n">1</BIT>
</BINARY>
</DATA>
XML is like violence - If it doesn't solve your problem, you're not using enough of it.
EDIT:
BTW: Base64 + CDATA is probably the best solution
(EDIT2:
Whoever upmods me, please also upmod the real answer. We don't want any poor soul to come here and actually implement my method because it was the highest ranked on SO, right?)

Base64 is indeed the right answer but CDATA is not, that's basically saying: "this could be anything", however it must not be just anything, it has to be Base64 encoded binary data. XML Schema defines Base 64 binary as a primitive datatype which you can use in your xsd.

I had this problem just last week. I had to serialize a PDF file and send it, inside an XML file, to a server.
If you're using .NET, you can convert a binary file directly to a base64 string and stick it inside an XML element.
string base64 = Convert.ToBase64String(File.ReadAllBytes(fileName));
Or, there is a method built right into the XmlWriter object. In my particular case, I had to include Microsoft's datatype namespace:
StringBuilder sb = new StringBuilder();
System.Xml.XmlWriter xw = XmlWriter.Create(sb);
xw.WriteStartElement("doc");
xw.WriteStartElement("serialized_binary");
xw.WriteAttributeString("types", "dt", "urn:schemas-microsoft-com:datatypes", "bin.base64");
byte[] b = File.ReadAllBytes(fileName);
xw.WriteBase64(b, 0, b.Length);
xw.WriteEndElement();
xw.WriteEndElement();
string abc = sb.ToString();
The string abc looks something that looks like this:
<?xml version="1.0" encoding="utf-16"?>
<doc>
<serialized_binary types:dt="bin.base64" xmlns:types="urn:schemas-microsoft-com:datatypes">
JVBERi0xLjMKJaqrrK0KNCAwIG9iago8PCAvVHlwZSAvSW5mbw...(plus lots more)
</serialized_binary>
</doc>

I usually encode the binary data with MIME Base64 or URL encoding.

Try Base64 encoding/decoding your binary data. Also look into CDATA sections

Any binary-to-text encoding will do the trick. I use something like that
<data encoding="yEnc>
<![CDATA[ encoded binary data ]]>
</data>

Maybe encode them into a known set - something like base 64 is a popular choice.

Base64 overhead is 33%.
BaseXML for XML1.0 overhead is only 20%. But it's not a standard and only have a C implementation yet. Check it out if you're concerned with data size. Note that however browsers tends to implement compression so that it is less needed.
I developed it after the discussion in this thread: Encoding binary data within XML : alternatives to base64.

While the other answers are mostly fine, you could try another, more space-efficient, encoding method like yEnc. (yEnc wikipedia link) With yEnc also get checksum capability right "out of the box". Read and links below. Of course, because XML does not have a native yEnc type your XML schema should be updated to properly describe the encoded node.
Why: Due to the encoding strategies base64/63, uuencode et al. encodings increase the amount of data (overhead) you need to store and transfer by roughly 40% (vs. yEnc's 1-2%). Depending on what you're encoding, 40% overhead could be/become an issue.
yEnc - Wikipedia abstract:
https://en.wikipedia.org/wiki/YEnc
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. ... An additional advantage of yEnc over previous encoding methods, such as uuencode and Base64, is the inclusion of a CRC checksum to verify that the decoded file has been delivered intact.
‎

You can also Uuencode you original binary data. This format is a bit older but it does the same thing as base63 encoding.

If you have control over the XML format, you should turn the problem inside out. Rather than attaching the binary XML you should think about how to enclose a document that has multiple parts, one of which contains XML.
The traditional solution to this is an archive (e.g. tar). But if you want to keep your enclosing document in a text-based format or if you don't have access to an file archiving library, there is also a standardized scheme that is used heavily in email and HTTP which is multipart/* MIME with Content-Transfer-Encoding: binary.
For example if your servers communicate through HTTP and you want to send a multipart document, the primary being an XML document which refers to a binary data, the HTTP communication might look something like this:
POST / HTTP/1.1
Content-Type: multipart/related; boundary="qd43hdi34udh34id344"
... other headers elided ...
--qd43hdi34udh34id344
Content-Type: application/xml
<myxml>
<data href="cid:data.bin"/>
</myxml>
--qd43hdi34udh34id344
Content-Id: <data.bin>
Content-type: application/octet-stream
Content-Transfer-Encoding: binary
... binary data ...
--qd43hdi34udh34id344--
As in above example, the XML refer to the binary data in the enclosing multipart by using a cid URI scheme which is an identifier to the Content-Id header. The overhead of this scheme would be just the MIME header. A similar scheme can also be used for HTTP response. Of course in HTTP protocol, you also have the option of sending a multipart document into separate request/response.
If you want to avoid wrapping your data in a multipart is to use data URI:
<myxml>
<data href="data:application/something;charset=utf-8;base64,dGVzdGRhdGE="/>
</myxml>
But this has the base64 overhead.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.