I use this code to read an email String in S/Mime format in a certificated email. This is a snippet
InputStream inputStreamObj = new ByteArrayInputStream(message.getBytes());
MimeMessage mimeMessageObj = new MimeMessage(session, inputStreamObj);
Object content = mimeMessageObj.getContent();
if (content instanceof Multipart) {
Multipart multiPart = (Multipart)content;
for (int i = 0; i < multiPart.getCount(); i++) {
BodyPart part = (MimeBodyPart) multiPart.getBodyPart(i);
if (part.getFileName() != null) {
System.out.println("Filename:"+part.getFileName());
} else if (part.getContent() instanceof Multipart) {
System.out.println("Multipart");
//here there is a recursive call to this method
} else if (part.getContent() instanceof String) {
System.out.println("Message text: "+part.getContent());
} else {
System.out.println("NOT RECOGNIZED TYPE");
}
}
}
In this manner I see:
Message text: <message in html form>
Message text: <message in txt form>
File: daticert.xml
File: postacert.eml
But here "smime.p7s" file is missing
How can I find this? In the String message (message) I see it:
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Disposition: attachment; filename="smime.p7s"
Where is the file???
Maybe I cannot use MimeMessage and I must use javax.mail.Message? And how can I convert the text in Message?
Solved!
The message-text received contains all (headers + bodypart). When managed, it "loose" headers parts. Adding these in the first message-text I now see all the attachments, even p7s file.
This file, infact, is nested to the main email using a code binding (printing the txt you can see it), but this link suffer of missing headers. In this manner, without headers, noone can address the p7s file.
The solution is: add headers in the form "name: value\n" at the beginning of the txt-message.
Related
I'm using Tika-server to parse bunch of eml files. Extracting both content and metadata of emls and attachments works fine while using /rmeta endpoint.
Problem occurs with proper attachment file name. When attachment part in raw eml file has got a following structure:
Content-Type: application/pdf; name="filename_a.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="filename_a.pdf"
everything works fine: extracted filename path in metadata object (in api response) is:
"X-TIKA:embedded_resource_path": "/filename_a.pdf"
However some of my emails have got malformed header structure (missing filename in Content-Disposition) i.e.:
Content-Type: application/pdf; name="filename_a.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
Then after parsing the whole eml I obtain:
"X-TIKA:embedded_resource_path": "/embedded-1"
I checked in Tika's source code that filename meta is defined in \org\apache\tika\parser\RecursiveParserWrapper.class here:
private String getResourceName(Metadata metadata, RecursiveParserWrapper.ParserState state) {
String objectName = "";
if (metadata.get("resourceName") != null) {
objectName = metadata.get("resourceName");
} else if (metadata.get("embeddedRelationshipId") != null) {
objectName = metadata.get("embeddedRelationshipId");
} else {
objectName = "embedded-" + ++state.unknownCount;
}
objectName = FilenameUtils.getName(objectName);
return objectName;
}
I was trying to access somehow mentioned filename attribute by inspecting Content-Type key in metadata object but it's not there. (I assume that Tika assess Content-type key not just by looking into proper header hence needed filename is absent)
Therefore my question (since I'm not able to figure it out) is there a way to modify Tika source code to force filename extraction from Content-Type header when proper filename attribute in Content-Disposition header is missing?
Ok, so I managed on my own. The workaround is preety simple and straightforward.
One has to extend one of the conditions in \org\apache\tika\parser\mail\MailContentHandler.class. In line 129 we have:
if (contentDispositionFileName != null) {
submd.set("resourceName", contentDispositionFileName);
}
By extending with additional else block:
if (contentDispositionFileName != null) {
submd.set("resourceName", contentDispositionFileName);
} else {
Map<String, String> contentTypeParameters = ((MaximalBodyDescriptor)body).getContentTypeParameters();
String contentTypeFilename = (String)contentTypeParameters.get("name");
submd.set("resourceName", contentTypeFilename);
}
we enforce the handler to look for an additional filename property in content type parameters.
I'm using javamail-1.4.5 for getting messages from gmail (imap). If Content-Disposition has an unquoted parameters, method getDisposition fails.
message part:
Content-Transfer-Encoding: 7bit
Content-Type: message/rfc822
Content-Disposition: attachment;
creation-date=Wed, 11 Feb 2015 10:23:48 GMT;
modification-date=Wed, 11 Feb 2015 10:23:48 GMT
exception:
javax.mail.internet.ParseException: Expected ';', got ","
at javax.mail.internet.ParameterList.<init>(ParameterList.java:289)
at javax.mail.internet.ContentDisposition.<init>(ContentDisposition.java:100)
at javax.mail.internet.MimeBodyPart.getDisposition(MimeBodyPart.java:1076)
UPD1: this is a part of my code. I'm getting error in method handlePart, line 1
private void handleMessage(Message message) {
Object content = message.getContent();
if(content instanceof Multipart) {
handleMultipart((Multipart) content);
}
else {
handlePart(message);
}
}
private void handleMultipart(Multipart mp) {
for(int i = 0; i < mp.getCount(); i++) {
Part part = mp.getBodyPart(i);
Object content = part.getContent();
if(content instanceof Multipart) {
handleMultipart((Multipart) content);
}
else {
handlePart(part);
}
}
}
private void handlePart(Part part) {
String disposition = part.getDisposition(); //GETTING ERROR
String contentType = part.getContentType();
if(disposition == null) {
if(contentType.toLowerCase().startsWith("text/html")) {
html = (String) part.getContent();
}
else if(contentType.toLowerCase().startsWith("text/plain")) {
text = (String) part.getContent();
}
else {
handleAttachment(part);
}
}
else if(disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
handleAttachment(part);
}
else if(disposition.equalsIgnoreCase(Part.INLINE)) {
handleAttachment(part);
}
}
The message is incorrectly formatted. What program created the message? Please report this bug to the owner of that program.
You can work around this bug by setting the System property "mail.mime.parameters.strict" to "false"; see the javadocs for the javax.mail.internet package and the ParameterList class.
Also, you might want to upgrade to the current 1.5.2 version of JavaMail.
It fails because there's a syntax error. The lack of quoting is illegal. There's not much you can do about the exception, short of submitting a patch, and patching around content-disposition and content-type errors is neverending work. In my experience, Content-Disposition gets more than its fair share of errors. I've written at least a dozen workarounds (not for javamail), each with unit tests. That's hard work and may not be worth it.
Since you have to have a decent fallback for unspecified C-D, you can leverage that fallback for errant and nonsensical dispositions too:
String disposition = null;
try {
disposition = part.getDisposition();
} catch(ParseException x) {
// treat Content-Disposition as unspecified if it cannot be parsed
disposition = null;
}
BTW: Send yourself a message with "Content-type: text/plain; utf8", and check that you handle that parse exception too.
I want to get full message body. So I try:
Message gmailMessage = service.users().messages().get("me", messageId).setFormat("full").execute();
That to get body, I try:
gmailMessage.getPayload().getBody().getData()
but result always null. How to get full message body?
To get the data from your gmailMessage, you can use gmailMessage.payload.parts[0].body.data. If you want to decode it into readable text, you can do the following:
import org.apache.commons.codec.binary.Base64;
import org.apache.commons.codec.binary.StringUtils;
System.out.println(StringUtils.newStringUtf8(Base64.decodeBase64(gmailMessage.payload.parts[0].body.data)));
I tried this way, since message.getPayload().getBody().getParts() was always null
import com.google.api.client.repackaged.org.apache.commons.codec.binary.Base64;
import com.google.api.client.repackaged.org.apache.commons.codec.binary.StringUtils;
(...)
Message message = service.users().messages().get(user, m.getId()).execute();
MessagePart part = message.getPayload();
System.out.println(StringUtils.newStringUtf8(Base64.decodeBase64(part.getBody().getData())));
And the result is pure HTML String
I found more interesting way how to resolve a full body message (and not only body):
System.out.println(StringUtils.newStringUtf8( Base64.decodeBase64 (message.getRaw())));
here is the solution in c# code gmail API v1 to read the email body content:
var request = _gmailService.Users.Messages.Get("me", mail.Id);
request.Format = UsersResource.MessagesResource.GetRequest.FormatEnum.Full;
and to solve the data error
var res = message.Payload.Body.Data.Replace("-", "+").Replace("_", "/");
byte[] bodyBytes = Convert.FromBase64String(res);
string val = Encoding.UTF8.GetString(bodyBytes);
If you have the message (com.google.api.services.gmail.model.Message) you could use the following methods:
public String getContent(Message message) {
StringBuilder stringBuilder = new StringBuilder();
try {
getPlainTextFromMessageParts(message.getPayload().getParts(), stringBuilder);
byte[] bodyBytes = Base64.decodeBase64(stringBuilder.toString());
String text = new String(bodyBytes, StandardCharsets.UTF_8);
return text;
} catch (UnsupportedEncodingException e) {
logger.error("UnsupportedEncoding: " + e.toString());
return message.getSnippet();
}
}
private void getPlainTextFromMessageParts(List<MessagePart> messageParts, StringBuilder stringBuilder) {
for (MessagePart messagePart : messageParts) {
if (messagePart.getMimeType().equals("text/plain")) {
stringBuilder.append(messagePart.getBody().getData());
}
if (messagePart.getParts() != null) {
getPlainTextFromMessageParts(messagePart.getParts(), stringBuilder);
}
}
}
It combines all message parts with the mimeType "text/plain" and returns it as one string.
When we get full message. The message body is inside Parts.
This is an example in which message headers (Date, From, To and Subject) are displayed and Message Body as a plain text is displayed. Parts in Payload returns both type of messages (plain text and formatted text). I was interested in Plain text.
Message msg = service.users().messages().get(user, message.getId()).setFormat("full").execute();
// Displaying Message Header Information
for (MessagePartHeader header : msg.getPayload().getHeaders()) {
if (header.getName().contains("Date") || header.getName().contains("From") || header.getName().contains("To")
|| header.getName().contains("Subject"))
System.out.println(header.getName() + ":" + header.getValue());
}
// Displaying Message Body as a Plain Text
for (MessagePart msgPart : msg.getPayload().getParts()) {
if (msgPart.getMimeType().contains("text/plain"))
System.out.println(new String(Base64.decodeBase64(msgPart.getBody().getData())));
}
Base on the #Tholle comment I've made something like that
Message message = service.users().messages()
.get(user, messageHolder.getId()).execute();
System.out.println(StringUtils.newStringUtf8(Base64.decodeBase64(
message.getPayload().getParts().get(0).getBody().getData())));
There is a method to decode the body:
final String body = new String(message.getPayload().getParts().get(0).getBody().decodeData());
Message message = service.users().messages().get(user, messageId).execute();
//Print email body
List<MessagePart> parts = message.getPayload().getParts();
String data = parts.get(0).getBody().getData();
String body = new String(BaseEncoding.base64Url().decode(data));
I use javax.mail to download mails from a given mail address in order to get the attachments (I expect images) and save the images on disk automatically (polling the mail address). This works fine except if the mail has been sent from an iPhone. It seems that in these cases the image is embedded in the mail (I can see the image in the web mail window) and cannot be downloaded as an attachment.
How can I extract the image from the mail?
What is the difference between iPhone mails and other mails regarding attachments?
Is the image a special part of the mail content?
In my program log I can see:
- contentType: multipart/mixed; boundary=Apple-Mail-...
- numberOfParts = 2
Java version is 1.7.0_21
javax.mail version is 1.4.7
This is the relevant code (most of it taken from http://www.codejava.net)
if (contentType.contains("multipart")) {
// content may contain attachments
Multipart multiPart = (Multipart) message.getContent();
numberOfParts = multiPart.getCount();
for (int partCount = 0; partCount < numberOfParts; partCount++) {
MimeBodyPart part = (MimeBodyPart) multiPart.getBodyPart(partCount);
if (Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) {
// this part is the attachment
String fileName = part.getFileName();
attachFiles += fileName + ", ";
if (fileName.endsWith("jpg") || fileName.endsWith("JPG")
|| fileName.endsWith("jpeg") || fileName.endsWith("JPEG")) {
part.saveFile(saveDirectory + File.separator + fileName);
} else {
// attachment is not an image
}
} else {
// this part may be the message content
messageContent = part.getContent().toString();
}
}
if (attachFiles.length() > 1) {
attachFiles = attachFiles.substring(0, attachFiles.length() - 2);
}
} else if (contentType.contains("text/plain") || contentType.contains("text/html")) {
Object content = message.getContent();
if (content != null) {
messageContent = content.toString();
}
}
Below code can be checked:
Multipart mp = new MimeMultipart("related")
Use the default constructor,which resolves the issue.
The code you have is full of assumptions about the structure of a message. Most likely, one of those assumptions is wrong. Fire up a debugger, add some print statements, or do whatever is necessary to step through your code and compare what you're actually getting with what you expect to get. You can also dump the raw MIME content of the message using the Message.writeTo method, to see what the MIME structure of the message really is.
Probably the first thing to check is whether the image is marked as an ATTACHMENT. Perhaps it's being sent as INLINE instead?
BTW, you never want to use the filename in the message directly; someone could send you all sorts of malicious junk in there.
I wrote an application which gets all emails from an inbox, filters the emails which contain a specific string and then puts those emails in an ArrayList.
After the emails are put in the List, I am doing some stuff with the subject and content of said emails. This works all fine for e-mails without an attachment. But when I started to use e-mails with attachments it all didn't work as expected anymore.
This is my code:
public void getInhoud(Message msg) throws IOException {
try {
cont = msg.getContent();
} catch (MessagingException ex) {
Logger.getLogger(ReadMailNew.class.getName()).log(Level.SEVERE, null, ex);
}
if (cont instanceof String) {
String body = (String) cont;
} else if (cont instanceof Multipart) {
try {
Multipart mp = (Multipart) msg.getContent();
int mp_count = mp.getCount();
for (int b = 0; b < 1; b++) {
dumpPart(mp.getBodyPart(b));
}
} catch (Exception ex) {
System.out.println("Exception arise at get Content");
ex.printStackTrace();
}
}
}
public void dumpPart(Part p) throws Exception {
email = null;
String contentType = p.getContentType();
System.out.println("dumpPart" + contentType);
InputStream is = p.getInputStream();
if (!(is instanceof BufferedInputStream)) {
is = new BufferedInputStream(is);
}
int c;
final StringWriter sw = new StringWriter();
while ((c = is.read()) != -1) {
sw.write(c);
}
if (!sw.toString().contains("<div>")) {
mpMessage = sw.toString();
getReferentie(mpMessage);
}
}
The content from the e-mail is stored in a String.
This code works all fine when I try to read mails without attachment. But if I use an e-mail with attachment the String also contains HTML code and even the attachment coding. Eventually I want to store the attachment and the content of an e-mail, but my first priority is to get just the text without any HTML or attachment coding.
Now I tried an different approach to handle the different parts:
public void getInhoud(Message msg) throws IOException {
try {
Object contt = msg.getContent();
if (contt instanceof Multipart) {
System.out.println("Met attachment");
handleMultipart((Multipart) contt);
} else {
handlePart(msg);
System.out.println("Zonder attachment");
}
} catch (MessagingException ex) {
ex.printStackTrace();
}
}
public static void handleMultipart(Multipart multipart)
throws MessagingException, IOException {
for (int i = 0, n = multipart.getCount(); i < n; i++) {
handlePart(multipart.getBodyPart(i));
System.out.println("Count "+n);
}
}
public static void handlePart(Part part)
throws MessagingException, IOException {
String disposition = part.getDisposition();
String contentType = part.getContentType();
if (disposition == null) { // When just body
System.out.println("Null: " + contentType);
// Check if plain
if ((contentType.length() >= 10)
&& (contentType.toLowerCase().substring(
0, 10).equals("text/plain"))) {
part.writeTo(System.out);
} else if ((contentType.length() >= 9)
&& (contentType.toLowerCase().substring(
0, 9).equals("text/html"))) {
part.writeTo(System.out);
} else if ((contentType.length() >= 9)
&& (contentType.toLowerCase().substring(
0, 9).equals("text/html"))) {
System.out.println("Ook html gevonden");
part.writeTo(System.out);
}else{
System.out.println("Other body: " + contentType);
part.writeTo(System.out);
}
} else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
System.out.println("Attachment: " + part.getFileName()
+ " : " + contentType);
} else if (disposition.equalsIgnoreCase(Part.INLINE)) {
System.out.println("Inline: "
+ part.getFileName()
+ " : " + contentType);
} else {
System.out.println("Other: " + disposition);
}
}
This is what is returned from the System.out.printlns
Null: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Other body: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Content-Type: multipart/alternative; boundary="047d7b6220720b499504ce3786d7"
--047d7b6220720b499504ce3786d7
Content-Type: text/plain; charset="ISO-8859-1"
'Text of the message here in normal text'
--047d7b6220720b499504ce3786d7
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
'HTML code of the message'
This approach returns the normal text of the e-mail but also the HTML coding of the mail. I really don't understand why this happens, I've googled it but it seems like there is no one else with this problem.
Any help is appreciated,
Thanks!
I found reading e-mail with the JavaMail library much more difficult than expected. I don't blame the JavaMail API, rather I blame my poor understanding of RFC-5322 -- the official definition of Internet e-mail.
As a thought experiment: Consider how complicated an e-mail message can become in the real world. It is possible to "infinitely" embed messages within messages. Each message itself may have multiple attachments (binary or human-readable text). Now imagine how complicated this structure becomes in the JavaMail API after parsing.
A few tips that may help when traversing e-mail with JavaMail:
Message and BodyPart both implement Part.
MimeMessage and MimeBodyPart both implement MimePart.
Where possible, treat everything as a Part or MimePart. This will allow generic traversal methods to be built more easily.
These Part methods will help to traverse:
String getContentType(): Starts with the MIME type. You may be tempted to treat this as a MIME type (with some hacking/cutting/matching), but don't. Better to only use this method inside the debugger for inspection.
Oddly, MIME type cannot be extracted directly. Instead use boolean isMimeType(String) to match. Read docs carefully to learn about powerful wildcards, such as "multipart/*".
Object getContent(): Might be instanceof:
Multipart -- container for more Parts
Cast to Multipart, then iterate as zero-based index with int getCount() and BodyPart getBodyPart(int)
Note: BodyPart implements Part
In my experience, Microsoft Exchange servers regularly provide two copies of the body text: plain text and HTML.
To match plain text, try: Part.isMimeType("text/plain")
To match HTML, try: Part.isMimeType("text/html")
Message (implements Part) -- embedded or attached e-mail
String (just the body text -- plain text or HTML)
See note above about Microsoft Exchange servers.
InputStream (probably a BASE64-encoded attachment)
String getDisposition(): Value may be null
if Part.ATTACHMENT.equalsIgnoreCase(getDisposition()), then call getInputStream() to get raw bytes of the attachment.
Finally, I found the official Javadocs exclude everything in the com.sun.mail package (and possibly more). If you need these, read the code directly, or generate the unfiltered Javadocs by downloading the source and running mvn javadoc:javadoc in the mail project module of the project.
Did you find these JavaMail FAQ entries?
How do I read a message with an attachment and save the attachment?
How do I tell if a message has attachments?
How do I find the main message body in a message that has attachments?
Following up on Kevin's helpful advice, analyzing your email content Java object types with respect to their canonical names (or simple names) can be helpful too. For example, looking at one inbox I've got right now, of 486 messages 399 are Strings, and 87 are MimeMultipart. This suggests that - for my typical email - a strategy that uses instanceof to first peel off Strings is best.
Of the Strings, 394 are text/plain, and 5 are text/html. This will not be the case for most; it's reflective of my email feeds into this particular inbox.
But wait - there's more!!! :-) The HTML sneaks in there nevertheless: of the 87 Multipart's, 70 are multipart/alternative. No guarantees, but most (if not all of these) are TEXT + HTML.
Of the other 17 multipart, incidentally, 15 are multipart/mixed, and 2 are multipart/signed.
My use case with this inbox (and one other) is primarily to aggregate and analyze known mailing list content. I can't ignore any of the messages, but an analysis of this sort helps me make my processing more efficient.