I wrote an application which gets all emails from an inbox, filters the emails which contain a specific string and then puts those emails in an ArrayList.
After the emails are put in the List, I am doing some stuff with the subject and content of said emails. This works all fine for e-mails without an attachment. But when I started to use e-mails with attachments it all didn't work as expected anymore.
This is my code:
public void getInhoud(Message msg) throws IOException {
try {
cont = msg.getContent();
} catch (MessagingException ex) {
Logger.getLogger(ReadMailNew.class.getName()).log(Level.SEVERE, null, ex);
}
if (cont instanceof String) {
String body = (String) cont;
} else if (cont instanceof Multipart) {
try {
Multipart mp = (Multipart) msg.getContent();
int mp_count = mp.getCount();
for (int b = 0; b < 1; b++) {
dumpPart(mp.getBodyPart(b));
}
} catch (Exception ex) {
System.out.println("Exception arise at get Content");
ex.printStackTrace();
}
}
}
public void dumpPart(Part p) throws Exception {
email = null;
String contentType = p.getContentType();
System.out.println("dumpPart" + contentType);
InputStream is = p.getInputStream();
if (!(is instanceof BufferedInputStream)) {
is = new BufferedInputStream(is);
}
int c;
final StringWriter sw = new StringWriter();
while ((c = is.read()) != -1) {
sw.write(c);
}
if (!sw.toString().contains("<div>")) {
mpMessage = sw.toString();
getReferentie(mpMessage);
}
}
The content from the e-mail is stored in a String.
This code works all fine when I try to read mails without attachment. But if I use an e-mail with attachment the String also contains HTML code and even the attachment coding. Eventually I want to store the attachment and the content of an e-mail, but my first priority is to get just the text without any HTML or attachment coding.
Now I tried an different approach to handle the different parts:
public void getInhoud(Message msg) throws IOException {
try {
Object contt = msg.getContent();
if (contt instanceof Multipart) {
System.out.println("Met attachment");
handleMultipart((Multipart) contt);
} else {
handlePart(msg);
System.out.println("Zonder attachment");
}
} catch (MessagingException ex) {
ex.printStackTrace();
}
}
public static void handleMultipart(Multipart multipart)
throws MessagingException, IOException {
for (int i = 0, n = multipart.getCount(); i < n; i++) {
handlePart(multipart.getBodyPart(i));
System.out.println("Count "+n);
}
}
public static void handlePart(Part part)
throws MessagingException, IOException {
String disposition = part.getDisposition();
String contentType = part.getContentType();
if (disposition == null) { // When just body
System.out.println("Null: " + contentType);
// Check if plain
if ((contentType.length() >= 10)
&& (contentType.toLowerCase().substring(
0, 10).equals("text/plain"))) {
part.writeTo(System.out);
} else if ((contentType.length() >= 9)
&& (contentType.toLowerCase().substring(
0, 9).equals("text/html"))) {
part.writeTo(System.out);
} else if ((contentType.length() >= 9)
&& (contentType.toLowerCase().substring(
0, 9).equals("text/html"))) {
System.out.println("Ook html gevonden");
part.writeTo(System.out);
}else{
System.out.println("Other body: " + contentType);
part.writeTo(System.out);
}
} else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
System.out.println("Attachment: " + part.getFileName()
+ " : " + contentType);
} else if (disposition.equalsIgnoreCase(Part.INLINE)) {
System.out.println("Inline: "
+ part.getFileName()
+ " : " + contentType);
} else {
System.out.println("Other: " + disposition);
}
}
This is what is returned from the System.out.printlns
Null: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Other body: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Content-Type: multipart/alternative; boundary="047d7b6220720b499504ce3786d7"
--047d7b6220720b499504ce3786d7
Content-Type: text/plain; charset="ISO-8859-1"
'Text of the message here in normal text'
--047d7b6220720b499504ce3786d7
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
'HTML code of the message'
This approach returns the normal text of the e-mail but also the HTML coding of the mail. I really don't understand why this happens, I've googled it but it seems like there is no one else with this problem.
Any help is appreciated,
Thanks!
I found reading e-mail with the JavaMail library much more difficult than expected. I don't blame the JavaMail API, rather I blame my poor understanding of RFC-5322 -- the official definition of Internet e-mail.
As a thought experiment: Consider how complicated an e-mail message can become in the real world. It is possible to "infinitely" embed messages within messages. Each message itself may have multiple attachments (binary or human-readable text). Now imagine how complicated this structure becomes in the JavaMail API after parsing.
A few tips that may help when traversing e-mail with JavaMail:
Message and BodyPart both implement Part.
MimeMessage and MimeBodyPart both implement MimePart.
Where possible, treat everything as a Part or MimePart. This will allow generic traversal methods to be built more easily.
These Part methods will help to traverse:
String getContentType(): Starts with the MIME type. You may be tempted to treat this as a MIME type (with some hacking/cutting/matching), but don't. Better to only use this method inside the debugger for inspection.
Oddly, MIME type cannot be extracted directly. Instead use boolean isMimeType(String) to match. Read docs carefully to learn about powerful wildcards, such as "multipart/*".
Object getContent(): Might be instanceof:
Multipart -- container for more Parts
Cast to Multipart, then iterate as zero-based index with int getCount() and BodyPart getBodyPart(int)
Note: BodyPart implements Part
In my experience, Microsoft Exchange servers regularly provide two copies of the body text: plain text and HTML.
To match plain text, try: Part.isMimeType("text/plain")
To match HTML, try: Part.isMimeType("text/html")
Message (implements Part) -- embedded or attached e-mail
String (just the body text -- plain text or HTML)
See note above about Microsoft Exchange servers.
InputStream (probably a BASE64-encoded attachment)
String getDisposition(): Value may be null
if Part.ATTACHMENT.equalsIgnoreCase(getDisposition()), then call getInputStream() to get raw bytes of the attachment.
Finally, I found the official Javadocs exclude everything in the com.sun.mail package (and possibly more). If you need these, read the code directly, or generate the unfiltered Javadocs by downloading the source and running mvn javadoc:javadoc in the mail project module of the project.
Did you find these JavaMail FAQ entries?
How do I read a message with an attachment and save the attachment?
How do I tell if a message has attachments?
How do I find the main message body in a message that has attachments?
Following up on Kevin's helpful advice, analyzing your email content Java object types with respect to their canonical names (or simple names) can be helpful too. For example, looking at one inbox I've got right now, of 486 messages 399 are Strings, and 87 are MimeMultipart. This suggests that - for my typical email - a strategy that uses instanceof to first peel off Strings is best.
Of the Strings, 394 are text/plain, and 5 are text/html. This will not be the case for most; it's reflective of my email feeds into this particular inbox.
But wait - there's more!!! :-) The HTML sneaks in there nevertheless: of the 87 Multipart's, 70 are multipart/alternative. No guarantees, but most (if not all of these) are TEXT + HTML.
Of the other 17 multipart, incidentally, 15 are multipart/mixed, and 2 are multipart/signed.
My use case with this inbox (and one other) is primarily to aggregate and analyze known mailing list content. I can't ignore any of the messages, but an analysis of this sort helps me make my processing more efficient.
Related
I use this code to read an email String in S/Mime format in a certificated email. This is a snippet
InputStream inputStreamObj = new ByteArrayInputStream(message.getBytes());
MimeMessage mimeMessageObj = new MimeMessage(session, inputStreamObj);
Object content = mimeMessageObj.getContent();
if (content instanceof Multipart) {
Multipart multiPart = (Multipart)content;
for (int i = 0; i < multiPart.getCount(); i++) {
BodyPart part = (MimeBodyPart) multiPart.getBodyPart(i);
if (part.getFileName() != null) {
System.out.println("Filename:"+part.getFileName());
} else if (part.getContent() instanceof Multipart) {
System.out.println("Multipart");
//here there is a recursive call to this method
} else if (part.getContent() instanceof String) {
System.out.println("Message text: "+part.getContent());
} else {
System.out.println("NOT RECOGNIZED TYPE");
}
}
}
In this manner I see:
Message text: <message in html form>
Message text: <message in txt form>
File: daticert.xml
File: postacert.eml
But here "smime.p7s" file is missing
How can I find this? In the String message (message) I see it:
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Disposition: attachment; filename="smime.p7s"
Where is the file???
Maybe I cannot use MimeMessage and I must use javax.mail.Message? And how can I convert the text in Message?
Solved!
The message-text received contains all (headers + bodypart). When managed, it "loose" headers parts. Adding these in the first message-text I now see all the attachments, even p7s file.
This file, infact, is nested to the main email using a code binding (printing the txt you can see it), but this link suffer of missing headers. In this manner, without headers, noone can address the p7s file.
The solution is: add headers in the form "name: value\n" at the beginning of the txt-message.
I'm using javamail-1.4.5 for getting messages from gmail (imap). If Content-Disposition has an unquoted parameters, method getDisposition fails.
message part:
Content-Transfer-Encoding: 7bit
Content-Type: message/rfc822
Content-Disposition: attachment;
creation-date=Wed, 11 Feb 2015 10:23:48 GMT;
modification-date=Wed, 11 Feb 2015 10:23:48 GMT
exception:
javax.mail.internet.ParseException: Expected ';', got ","
at javax.mail.internet.ParameterList.<init>(ParameterList.java:289)
at javax.mail.internet.ContentDisposition.<init>(ContentDisposition.java:100)
at javax.mail.internet.MimeBodyPart.getDisposition(MimeBodyPart.java:1076)
UPD1: this is a part of my code. I'm getting error in method handlePart, line 1
private void handleMessage(Message message) {
Object content = message.getContent();
if(content instanceof Multipart) {
handleMultipart((Multipart) content);
}
else {
handlePart(message);
}
}
private void handleMultipart(Multipart mp) {
for(int i = 0; i < mp.getCount(); i++) {
Part part = mp.getBodyPart(i);
Object content = part.getContent();
if(content instanceof Multipart) {
handleMultipart((Multipart) content);
}
else {
handlePart(part);
}
}
}
private void handlePart(Part part) {
String disposition = part.getDisposition(); //GETTING ERROR
String contentType = part.getContentType();
if(disposition == null) {
if(contentType.toLowerCase().startsWith("text/html")) {
html = (String) part.getContent();
}
else if(contentType.toLowerCase().startsWith("text/plain")) {
text = (String) part.getContent();
}
else {
handleAttachment(part);
}
}
else if(disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
handleAttachment(part);
}
else if(disposition.equalsIgnoreCase(Part.INLINE)) {
handleAttachment(part);
}
}
The message is incorrectly formatted. What program created the message? Please report this bug to the owner of that program.
You can work around this bug by setting the System property "mail.mime.parameters.strict" to "false"; see the javadocs for the javax.mail.internet package and the ParameterList class.
Also, you might want to upgrade to the current 1.5.2 version of JavaMail.
It fails because there's a syntax error. The lack of quoting is illegal. There's not much you can do about the exception, short of submitting a patch, and patching around content-disposition and content-type errors is neverending work. In my experience, Content-Disposition gets more than its fair share of errors. I've written at least a dozen workarounds (not for javamail), each with unit tests. That's hard work and may not be worth it.
Since you have to have a decent fallback for unspecified C-D, you can leverage that fallback for errant and nonsensical dispositions too:
String disposition = null;
try {
disposition = part.getDisposition();
} catch(ParseException x) {
// treat Content-Disposition as unspecified if it cannot be parsed
disposition = null;
}
BTW: Send yourself a message with "Content-type: text/plain; utf8", and check that you handle that parse exception too.
I use javax.mail to download mails from a given mail address in order to get the attachments (I expect images) and save the images on disk automatically (polling the mail address). This works fine except if the mail has been sent from an iPhone. It seems that in these cases the image is embedded in the mail (I can see the image in the web mail window) and cannot be downloaded as an attachment.
How can I extract the image from the mail?
What is the difference between iPhone mails and other mails regarding attachments?
Is the image a special part of the mail content?
In my program log I can see:
- contentType: multipart/mixed; boundary=Apple-Mail-...
- numberOfParts = 2
Java version is 1.7.0_21
javax.mail version is 1.4.7
This is the relevant code (most of it taken from http://www.codejava.net)
if (contentType.contains("multipart")) {
// content may contain attachments
Multipart multiPart = (Multipart) message.getContent();
numberOfParts = multiPart.getCount();
for (int partCount = 0; partCount < numberOfParts; partCount++) {
MimeBodyPart part = (MimeBodyPart) multiPart.getBodyPart(partCount);
if (Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) {
// this part is the attachment
String fileName = part.getFileName();
attachFiles += fileName + ", ";
if (fileName.endsWith("jpg") || fileName.endsWith("JPG")
|| fileName.endsWith("jpeg") || fileName.endsWith("JPEG")) {
part.saveFile(saveDirectory + File.separator + fileName);
} else {
// attachment is not an image
}
} else {
// this part may be the message content
messageContent = part.getContent().toString();
}
}
if (attachFiles.length() > 1) {
attachFiles = attachFiles.substring(0, attachFiles.length() - 2);
}
} else if (contentType.contains("text/plain") || contentType.contains("text/html")) {
Object content = message.getContent();
if (content != null) {
messageContent = content.toString();
}
}
Below code can be checked:
Multipart mp = new MimeMultipart("related")
Use the default constructor,which resolves the issue.
The code you have is full of assumptions about the structure of a message. Most likely, one of those assumptions is wrong. Fire up a debugger, add some print statements, or do whatever is necessary to step through your code and compare what you're actually getting with what you expect to get. You can also dump the raw MIME content of the message using the Message.writeTo method, to see what the MIME structure of the message really is.
Probably the first thing to check is whether the image is marked as an ATTACHMENT. Perhaps it's being sent as INLINE instead?
BTW, you never want to use the filename in the message directly; someone could send you all sorts of malicious junk in there.
SO i'm making a mail client for a homework assignment and one of the requirements is to handle incoming attachments. The first thing I want to do is just show if an email even has an attachment or not. I have a bunch of AWT lists that are side by side for From, Subject, Size, Date, Attachment.
For testing purposes, if the disposition returns null, i just put an x in the attachmentList. If its inline, it puts an i and for attachments it should show the filename. However, even on emails where there are attachments and looking at the headers in gmail webmail, which shows the content disposition as attachment (all lower case), the getDisposition of the email still returns null. I don't get why its not returning ATTACHMENT or attachment or something besides null. Here is the relevant code.
for (int i = 0; i < messages.length; i++) {
Address[] froms = messages[i].getFrom();
String email = froms == null ? null : ((InternetAddress) froms[0]).getAddress();
fromList.add(email);
subjectList.add(messages[i].getSubject());
sizeList.add("" + messages[i].getSize());
dateList.add(messages[i].getReceivedDate().toString());
String disposition = messages[i].getDisposition();
System.out.println("Disposition is " + disposition + ".");
if (disposition == null) {
attachmentList.add("x");
}
else if ("INLINE".equalsIgnoreCase(disposition)) {
attachmentList.add("i");
}
else if ("ATTACHMENT".equalsIgnoreCase(disposition)) {
String fileName = messages[i].getFileName();
if (fileName != null) {
attachmentList.add("attachment " + fileName);
}
}
}
You'll notice that it prints "the disposition is..." which is another testing code and it always prints either null or INLINE. The particular email i'm looking at is about 700k and contains 2 attachments.
Look at the raw MIME text of the message and make sure the Content-Disposition header is set as you expect.
Turn on JavaMail session debugging and examine the protocol trace in the debug output.
Are you using IMAP to read the message? If so, the IMAP server parses the message and returns the "disposition" information in the IMAP protocol message. The IMAP server may not be parsing the message correctly or may not be returning the disposition information correctly.
I'm not writing a mail application, so I don't have access to all the headers and such. All I have is something like the block at the end of this question. I've tried using the JavaMail API to parse this, using something like
Session s = Session.getDefaultInstance(new Properties());
InputStream is = new ByteArrayInputStream(<< String to parse >>);
MimeMessage message = new MimeMessage(s, is);
Multipart multipart = (Multipart) message.getContent();
But, it just tells me that message.getContent is a String, not a Multipart or MimeMultipart. Plus, I don't really need all the overhead of the whole JavaMail API, I just need to parse the text into it's parts. Here's an example:
This is a multi-part message in MIME format.\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/plain;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\nStuff:\n\n Please read this stuff at the beginning of each week. =\nFeel free to discuss it throughout the week.\n\n\n--=20\n\nMrs. Suzy M. Smith\n555-555-5555\nsuzy#suzy.com\n------=_NextPart_000_005D_01CC73D5.3BA43FB0\nContent-Type: text/html;\n\tcharset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\n\n\n\n\n\n\n\n\nStuff:\n =20\nPlease read this stuff at the beginning of each =\nweek. Feel=20\nfree to discuss it throughout the week.\n-- Mrs. Suzy M. Smith555-555-5555suzy#suzy.com\n\n------=_NextPart_000_005D_01CC73D5.3BA43FB0--\n\n
First I took your example message and replaced all occurrences of \n with newlines and \t with tabs.
Then I downloaded the JARs from the Mime4J project, a subproject of Apache James, and executed the GUI parsing example org.apache.james.mime4j.samples.tree.MessageTree with the transformed message above as input. And apparently Mime4J was able to parse the message and to extract the HTML message part.
There are a few things wrong with the text you posted.
It is not a valid multi-part mime. Check out wikipedia reference which, while non-normative, is still correct.
The mime boundary is not defined. From the wikipedia example: Content-Type: multipart/mixed; boundary="frontier" shows that the boundary is "frontier". In your example, "----=_NextPart_000_005D_01CC73D5.3BA43FB0" is the boundary, but that can only be determined by scanning the text (i.e. the mime is malformed). You need to instruct the goofball that is passing you the mime content that you also need to know the mime boundary value, which is not defined in a message header. If you get the entire body of the message you will have enough because the body of the message starts with MIME-Version: 1.0 followed by Content-Type: multipart/mixed; boundary="frontier" where frontier will be replaced with the value of the boundary for the encoded mime.
If the person who is sending the body is a goofball (changed from monkey because monkey is too judgemental - my bad DwB), and will not (more likely does not know how to) send the full body, you can derive the boundary by scanning the text for a line that starts and ends with "--" (i.e. --boundary--). Note that I mentioned a "line". The terminal boundary is actually "--boundary--\n".
Finally, the stuff you posted has 2 parts. The first part appears to define substitutions to take place in the second part. If this is true, the Content-Type: of the first part should probably be something other than "text/plain". Perhaps "companyname/substitution-definition" or something like that. This will allow for multiple (as in future enhancements) substitution formats.
Can create MimeMultipart from http request.
javax.mail.internet.MimeMultipart m = new MimeMultipart(new ServletMultipartDataSource(httpRequest));
public class ServletMultipartDataSource implements DataSource {
String contentType;
InputStream inputStream;
public ServletMultipartDataSource(ServletRequest request) throws IOException {
inputStream = new SequenceInputStream(new ByteArrayInputStream("\n".getBytes()), request.getInputStream());
contentType = request.getContentType();
}
public InputStream getInputStream() throws IOException {
return inputStream;
}
public OutputStream getOutputStream() throws IOException {
return null;
}
public String getContentType() {
return contentType;
}
public String getName() {
return "ServletMultipartDataSource";
}
}
For get submitted form parameter need parse BodyPart headers:
public String getStringParameter(String name) throws MessagingException, IOException {
for (int i = 0; i < getCount(); i++) {
BodyPart bodyPart = m.getBodyPart(i);
String[] nameHeader = bodyPart.getHeader("Content-Disposition");
if (nameHeader != null && content instanceof String) {
for (String bodyName : nameHeader) {
if (bodyName.contains("name=\"" + name + "\"")) return String.valueOf(bodyPart.getContent());
}
}
}
return null;
}
If you are using javax.servlet.http.HttpServlet to receive the message, you will have to use HttpServletRequests.getHeaders to obtain the value of the HTTP header content-type. You will then use org.apache.james.mime4j.stream.MimeConfig.setHeadlessParsing to set the MimeConfig with the information so that it can properly process the mime message.
It appears that you are using HttpServletRequest.getInputStream to read the contents of the request. The input stream returned only has the content of the message after the HTTP headers (terminated by a blank line). That is why you have to extract content-type from the HTTP headers and feed it to the parser using setHeadlessParsing.