Giter Site home page Giter Site logo

outlook-message-parser's Introduction

APACHE v2 License Latest Release Javadocs Codacy

Outlook Message Parser

Outlook Message Parser is a small open source Java library that parses Outlook .msg files.

<dependency>
  <groupId>org.simplejavamail</groupId>
  <artifactId>outlook-message-parser</artifactId>
  <version>1.14.1</version>
</dependency>

Outlook Message Parser is a continuation (or fork if that project independently continues) of msgparser.

Under the hood it uses the Apache POI - POIFS library to parse the message files which use the OLE 2 Compound Document format. Thus, it is merely a convenience library that covers the details of the .msg file. The implementation is based on the information provided at fileformat.info.

v1.14.0 - v1.14.1

  • 1.14.1 (08-06-2024): #64: [Bug] Parsing lists to HTML has double bullet points
  • 1.14.0 (25-05-2024): #80: RTF converted to HTML doesn't always detect charset properly

v1.13.0 - v1.13.4

  • 1.13.4 (04-May-2024): bumped apache poi to 5.2.5 and managed commons-io to 2.16.1
  • 1.13.3 (04-May-2024): bumped angus-activation from 2.0.2 to 2.0.3
  • 1.13.2 (05-April-2024): #73 B: Don't overwrite existing address, but do retain X500 address if available
  • 1.13.1 (04-April-2024): #73 A: Further improve X500 addresses detection
  • 1.13.0 (18-January-2024): #71: Update to latest Jakarta+Angus dependencies

v1.12.0 (10-December-2023)

  • #70: [Enhancement] ignore recipients with null-address

v1.11.0 - v1.11.1

  • 1.11.1 (08-December-2023): #69: Enhancement: instead of ignoring them completely, only ignore for embedded images
  • 1.11.0 (08-December-2023): #69: Enhancement: ignore attachment with missing content

v1.10.0 - v1.10.2

  • 1.10.2 (03-December-2023): #68 Improved heuristics for X500 Names
  • 1.10.1 (24-October-2023): #67 Fixed "possibility to parse X500 Names"
  • 1.10.0 (24-October-2023): #67 Adding possibility to parse X500 Names (dont' use this version)

v1.9.0 - v1.9.6

  • v1.9.6 (18-July-2022): #57 Same, but now with Collection values to support duplicate headers
  • v1.9.5 (18-July-2022): #57 Headers should be more accessible, rather than just a big string of text
  • v1.9.x - a bunch of dependency fixes and tries apparently, my release train was not so smooth here, sorry
  • v1.9.0 (13-May-2021): #55 CVE issue: Update Apache POI and POI Scratchpad

v1.8.0 - v1.8.1

  • v1.8.1 (31-January-2022): #41 OutlookMessage.getPropertyValue() should be public
  • v1.8.0 (31-January-2022): #52 Adjust dependencies and make Java 9+ friendly
  • v1.8.0 (31-January-2022): #45 Bump commons-io from 2.6 to 2.7

v1.7.10 - v1.7.13 (17-November-2021)

  • #49 bugfix solved by improved charset handling
  • #46 bugfix Rare NPE case of producing empty nested outlook attachment when there should be no attachments
  • #43 bugfix bugfix getFromEmailFromHeaders cannot handle "quoted-name-with@at-sign"
  • some minor code improvements

v1.7.9 (10-October-2020)

  • #28 / #36 bugfix NumberFormatException on parsing .msg files

v1.7.8 (4-August-2020)

  • #35 Clarify permission to publish project using Apache v2 license

v1.7.0 - v1.7.7 (9-January-2020 - 17-July-2020)

  • v1.7.7 - #34 Wrong encoding for bodyHTML
  • v1.7.5 - #31 Bugfix for attachments with special characters in the name
  • v1.7.4 - #27 Same as 1.7.3, but now also for chinese senders
  • v1.7.3 - #27 When from name/address are not available (unsent emails), these fields are filled with binary garbage
  • v1.7.2 - #26 To email address is not handled properly when name is omitted
  • v1.7.1 - #25 NPE on ClientSubmitTime when original message has not been sent yet
  • v1.7.1 - #23 Bug: _nameid directory should not be parsed (and causing invalid HTML body)
  • v1.7.0 - #18 Upgrade Apache POI 3.9 -> 4.x

Note: Apache POI requires minimum Java 8

v1.6.0 (8-January-2020)

  • #21 Multiple TO recipients are not handles properly

v1.5.0 (18-December-2019)

  • #20 CC and BCC recipients are not parsed properly
  • #19 Use real Outlook ContentId Attribute to resolve CID Attachments

v1.4.1 (22-October-2019)

  • #17 Fixed encoding error for UTF-8's Windows legacy name (cp)65001

v1.4.0 (13-October-2019)

  • #9 Replaced the RFC to HTML converter with a brand new RFC-compliant convert! (thanks to @fadeyev!)

v1.3.0 (4-October-2019)

  • #14 Dependency problem with Java9+, missing Jakarta Activation Framework
  • #13 HTML start tags with extra space not handled correctly
  • #11 SimpleRTF2HTMLConverter inserts too many
    tags
  • #10 Embedded images with DOS-like names are classified as attachments
  • #9 SimpleRTF2HTMLConverter removes some valid tags during conversion

v1.2.1 (12-May-2019)

  • Ignore non S/MIME related content types when extracting S/MIME metadata
  • Added toString and equals methods to the S/MIME data classes

v1.1.21 (4-May-2019)

  • Upgraded mediatype recognition based on file extension for incomplete attachments
  • Added / improved support for public S/MIME meta data

v1.1.20 (14-April-2019)

  • #7 Fix missing S/MIME header details that are needed to determine the type of S/MIME application

v1.1.19 (10-April-2019)

  • Log rtf compression error, but otherwise ignore it and keep going and extract what we can.

v1.1.18 (5-April-2019)

  • #6 Missing mimeTag for attachments should be guessed based on file extension

v1.1.17 (19-August-2018)

  • #3 implemented robust support for character sets / code pages in RTF to HTML conversion (fixes chinese support #3)
  • fixed bug where too much text was cleaned up as part of superfluous RTF cleanup step when converting to HTML
  • Performance boost in the RTF -> HTML converter

v1.1.16 (~28-Februari-2017)

v1.16

  • Added support for replyTo name and address
  • cleaned up code (1st wave)

outlook-message-parser's People

Contributors

basinilya avatar bbottema avatar dependabot[bot] avatar dwlabcube avatar kacperfkorban avatar norrisjeremy avatar sanastasiadis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

outlook-message-parser's Issues

getFromEmailFromHeaders cannot handle "quoted-name-with@at-sign"

Even though Sender Name and Email addresse are stored in the fields 0xc1f and 0x42 respectively, the parser extracts and overwrites this info from the transport headers 0x7d.

If such a header contains quoted name with an @ at sign:

it wrongly calls setFromEmail("\"[email protected]\"", true)

When trying to use this e-mail with javax.mail it fails:

Caused by: javax.mail.internet.AddressException: Missing final '@domain' in string ``"[email protected]"''
	at javax.mail.internet.InternetAddress.checkAddress(InternetAddress.java:1279)

Please consider reusing or copying new javax.mail.internet.InternetHeaders(InputStream) for splitting the headers and new javax.mail.internet.InternetAddress(String) for parsing the address string.

Can the outlook-message-parser library handle the UTF-8 characters

I'm using the org.simplejavamail:outlook-message-parser library:

        OutlookMessageParser outlookMessageParser = new OutlookMessageParser();
        OutlookMessage outlookMessage = outlookMessageParser.parseMsg("myMessage.msg");
        System.out.println(outlookMessage.getBodyText());

and output is:

Testiviestin sis?lt?

The original Outlook message contains this body: "Testiviestin sisältö".

Can the org.simplejavamail:outlook-message-parser library handle the UTF-8 characters?

Clarify permission to publish project using Apache v2 license

The project this library originally continued was published under GPL3, which is not forward compatible with Apache v2, unless you have explicit permission to do so.

Now, this library has basically evolved into a complete rewrite, so it remains to be seen in how far a permission is still required, but this is a bit of a grey area. Better to be safe and clear on it for 3rd party users.

Error in compiling

Sorry I am new to java, How do i compile this? I am getting an error

package javax.activation does not exist

NumberFormatException on parsing .msg files

I have this problem with every email I parse with the normal simple java mail outlook module.
I just paste a few of the errors because otherwise this comment was way too long.


java.lang.NumberFormatException: For input string: "101f-00000004" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]


2020-10-27 15:19:41.362  INFO 17236 --- [    Test worker] o.s.o.OutlookMessageParser               : Could not parse directory entry __substg1.0_8011101F-00000006

java.lang.NumberFormatException: For input string: "101f-00000006" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]


2020-10-27 15:19:41.386  INFO 17236 --- [    Test worker] o.s.o.OutlookMessageParser               : Could not parse directory entry __substg1.0_8011101F-00000007

java.lang.NumberFormatException: For input string: "101f-00000007" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]


2020-10-27 15:19:41.399  INFO 17236 --- [    Test worker] o.s.o.OutlookMessageParser               : Could not parse directory entry __substg1.0_8011101F-00000005

java.lang.NumberFormatException: For input string: "101f-00000005" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]

2020-10-27 15:19:41.432  INFO 17236 --- [    Test worker] o.s.o.OutlookMessageParser               : Could not parse directory entry __substg1.0_8011101F-00000002

java.lang.NumberFormatException: For input string: "101f-00000002" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]


2020-10-27 15:19:41.446  INFO 17236 --- [    Test worker] o.s.o.OutlookMessageParser               : Could not parse directory entry __substg1.0_8011101F-00000003

java.lang.NumberFormatException: For input string: "101f-00000003" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]


2020-10-27 15:19:41.460  INFO 17236 --- [    Test worker] o.s.o.OutlookMessageParser               : Could not parse directory entry __substg1.0_8011101F-00000001

java.lang.NumberFormatException: For input string: "101f-00000001" under radix 16
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:658) ~[na:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:581) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:399) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:262) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:208) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133) ~[outlook-message-parser-1.7.5.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.parseOutlookMsg(OutlookEmailConverter.java:119) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:62) ~[outlook-module-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209) ~[simple-java-mail-6.4.4.jar:na]
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201) ~[simple-java-mail-6.4.4.jar:na]

I attached an example email to the comment.

test.msg

Originally posted by @Jonbeckas in #28 (comment)

Encoding issues with bodyHTML

I have an issue that result in a problem similar to #34. When using the following code the string has a messed up encoding.

try (FileInputStream fileInputStream = new FileInputStream(msgFileName)) {
	OutlookMessageParser outlookMessageParser = new OutlookMessageParser();
	OutlookMessage outlookMessage = outlookMessageParser.parseMsg(msgFileName);
			
	System.out.println(outlookMessage.getBodyHTML());
}

This is an extract of what is returned:

<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">ich habe die AB geändert und Ihnen zugeschickt.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">Im Preis ist die Preiserhöhung ab dem 16.08.2021 enthalten.

From what I've gathered so far this is happens in this part of your code:

case 0x1e:
// we put the complete data into a byte[] object...
final byte[] textBytes1e = getBytesFromDocumentEntry(de);
// ...and create a String object from it
return new String(textBytes1e, "ISO-8859-1");

Modifying the code similar to what was suggested in #34 fixes the problem (at least for us) and from what we've seen with our test mails doesn't break any of them.

case 0x1e:
	// we put the complete data into a byte[] object...
	final byte[] textBytes1e = getBytesFromDocumentEntry(de);
	// ...and create a String object from it
	
	String convertedString = new String(textBytes1e, "ISO-8859-1");
	Pattern pattern = Pattern.compile("charset=(\"|)([\\w\\-]+)\\1", Pattern.CASE_INSENSITIVE);
	Matcher m = pattern.matcher(convertedString);
	if(m.find()) {
		try {
			convertedString = new String(textBytes1e, Charset.forName(m.group(2)));
		} catch (Exception e) {
			//ignore and use default charset
		}
	}
	return convertedString;

I'm currently trying to get example mails, I have one so far but I can not publish it here, so I'd have to send it to you directly and with the condition that it can't be published anywhere, including test cases. If you want I can send you this one.

NoClassDefFoundError when trying to parse .msg file with pdf attachment

I tried parsing a .msg file with a pdf attachment and when I tried to run the code it threw a ClassNotFoundExpection.

My build.gradle looks like the following:

plugins {
	id 'java'
	id 'org.springframework.boot' version '2.6.6'
	id 'io.spring.dependency-management' version '1.1.0'
}

group = 'com.example'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = '1.11'

repositories {
	mavenCentral()
}

dependencies {
	implementation 'org.springframework.boot:spring-boot-starter'
	implementation 'org.simplejavamail:outlook-message-parser:1.9.6'

	testImplementation 'org.springframework.boot:spring-boot-starter-test'
}

tasks.named('test') {
	useJUnitPlatform()
}

The code I was using:

InputStream is = getClass().getClassLoader().getResourceAsStream("msg/attachment_pdf.msg");
OutlookMessageParser msgp = new OutlookMessageParser();
OutlookMessage msg = msgp.parseMsg(is);

The error that showed up:

java.lang.NoClassDefFoundError: jakarta/activation/MimetypesFileTypeMap
	at org.simplejavamail.outlookmessageparser.model.MimeType.createMap(MimeType.java:31) ~[outlook-message-parser-1.9.6.jar:na]
	at org.simplejavamail.outlookmessageparser.model.MimeType.<clinit>(MimeType.java:24) ~[outlook-message-parser-1.9.6.jar:na]
	at org.simplejavamail.outlookmessageparser.model.OutlookFileAttachment.checkMimeTag(OutlookFileAttachment.java:100) ~[outlook-message-parser-1.9.6.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseAttachment(OutlookMessageParser.java:676) ~[outlook-message-parser-1.9.6.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:224) ~[outlook-message-parser-1.9.6.jar:na]
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:138) ~[outlook-message-parser-1.9.6.jar:na]
	at com.example.demo.DemoApplication.doSomethingAfterStartup(DemoApplication.java:23) ~[main/:na]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
	at org.springframework.context.event.ApplicationListenerMethodAdapter.doInvoke(ApplicationListenerMethodAdapter.java:344) ~[spring-context-5.3.18.jar:5.3.18]
	... 17 common frames omitted
Caused by: java.lang.ClassNotFoundException: jakarta.activation.MimetypesFileTypeMap
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) ~[na:na]
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[na:na]
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[na:na]
	... 29 common frames omitted

Attached an email with an attached pdf file.
attachment_pdf.zip

IllegalArgumentException with Quoted-Printable EML File

We have an EML File where when running the following code:

try (FileInputStream fileInputStream = new FileInputStream(emlFileName)) {	
	Email email = EmailConverter.emlToEmail(fileInputStream);	
	System.out.println(email.getHTMLText());
}

we get the following exception:

Exception in thread "main" java.lang.IllegalArgumentException: unknown content transfer encoder: QUOTED-PRINTABLE
	at org.simplejavamail.api.email.ContentTransferEncoding.lambda$byEncoder$1(ContentTransferEncoding.java:52)
	at java.base/java.util.Optional.orElseThrow(Optional.java:408)
	at org.simplejavamail.api.email.ContentTransferEncoding.byEncoder(ContentTransferEncoding.java:52)
	at org.simplejavamail.converter.EmailConverter.buildEmailFromMimeMessage(EmailConverter.java:667)
	at org.simplejavamail.converter.EmailConverter.mimeMessageToEmailBuilder(EmailConverter.java:136)
	at org.simplejavamail.converter.EmailConverter.mimeMessageToEmailBuilder(EmailConverter.java:122)
	at org.simplejavamail.converter.EmailConverter.emlToEmailBuilder(EmailConverter.java:390)
	at org.simplejavamail.converter.EmailConverter.emlToEmailBuilder(EmailConverter.java:369)
	at org.simplejavamail.converter.EmailConverter.emlToEmail(EmailConverter.java:303)
	at org.simplejavamail.converter.EmailConverter.emlToEmail(EmailConverter.java:295)
	at ParseP7s.main(ParseP7s.java:22)

I've tested it with the following versions:

  • 7.1.1 --> Issue is not present, Email is parsed as expected
  • 7.4.0 --> Issue is present
  • 7.5.0 --> Issue is present

I can not publish the example mail we're using, but if you need it I can send it directly to you.

Error parsing attachments with regex reserved chars in the filename

return compile("cid:['\"]?" + cidName + "['\"]?").matcher(html).find();

Example filename: [01-06-2020 09.00.14]1 - Patch Compliance.PDF

Error:
java.util.regex.PatternSyntaxException: Illegal character range near index 13
cid:['"]?[01-06-2020 09.00.14]1 - Patch Compliance.PDF['"]?
^] with root cause
java.util.regex.PatternSyntaxException: Illegal character range near index 13
cid:['"]?[01-06-2020 09.00.14]1 - Patch Compliance.PDF['"]?
^
at java.util.regex.Pattern.error(Pattern.java:1957)
at java.util.regex.Pattern.range(Pattern.java:2657)
at java.util.regex.Pattern.clazz(Pattern.java:2564)
at java.util.regex.Pattern.sequence(Pattern.java:2065)
at java.util.regex.Pattern.expr(Pattern.java:1998)
at java.util.regex.Pattern.compile(Pattern.java:1698)
at java.util.regex.Pattern.(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at org.simplejavamail.outlookmessageparser.model.OutlookMessage.htmlContainsCID(OutlookMessage.java:343)

Parse error when trying to fetch true attachments

Parsing msg with attachment named Attachment[1.pdf and trying fetchTrueAttachments() throws an exception, because cid:['\"]Attachment[1.pdf['\"] is not a valid regex. (the square bracket isn't closed)

I created a PR with a fix for it #30.

Re: MimeType NoClassDefFoundError

Hi there,

I am facing a issue with the error below the email looks normal but with just a pdf attachement.

is there any advise how should i show the error log?

java.lang.NoClassDefFoundError: Could not initialize class org.simplejavamail.outlookmessageparser.model.MimeType
at org.simplejavamail.outlookmessageparser.model.OutlookFileAttachment.checkMimeTag(OutlookFileAttachment.java:100) ~[outlook-message-parser-1.9.6.jar:na]
at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseAttachment(OutlookMessageParser.java:676) ~[outlook-message-parser-1.9.6.jar:na]
at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:224) ~[outlook-message-parser-1.9.6.jar:na]
at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:138) ~[outlook-message-parser-1.9.6.jar:na]

nested .msg properties are lost

When I parse the files with nested (embedded) Outlook messages the properties such as Content-ID and Content-Type are discarded.
Please store the current attachment object as a public property of OutlookMsgAttachment when creating it inside OutlookMessageParser.parseAttachment()

Embedded images with DOS-like names are classified as attachments

During parsing of one particular email one of the embedded images was qualified as an attachment, not as an embedded image (though actually it was an embedded image).

Not sure how this email was created, however debug showed that OutlookFileAttachment longFilename property was something like my-embedded-image.png when short file name was my-emb~1.png, while in the html body the image was referenced by its full name: src="cid:my-embedded-image.png". Probably we can change OutlookMessage#htmlContainsCID() method to check both filename and longFilename

Probably the same issue was reported earlier in #1

SimpleRTF2HTMLConverter removes some valid tags during conversion

The following line in Outlook msg RTF file:
{\*\htmltag241 -->}
should turn into --> during conversion to HTML.
However it's replace with empty string by line 139 of SimpleRTF2HTMLConverter
replacedText = replacedText.replaceAll("\\{\\\\\\*\\\\htmltag\\d+[^\\}<]+\\}", "");
Thus resulting HTML file is broken and has invalid structure

Embedded image name not resolved correctly, when it's not equal to filename

I had this issue on Outlook 2013. Steps to reproduce:

  1. Create a new email
  2. Take a screenshot to the clipboard
  3. Paste it in the email
  4. Send the email
  5. Save the msg file and parse it

In the HTML body the image will be referenced as cid:[email protected] when both filename and longFilename of OutlookFileAttachment will be image001.gif. Seems like some information is lost here. When trying to send parsed email again embedded images are not resolved correctly.

To email address is not handled properly when name is omitted

Currently the use case where the name is actually the email address is currently not handled properly, resulting in a name with the value of the actual email address and a null-email address.

The only way to handle this without complete email address validation is by assuming it is both the name and address if it might be an email address.

Originally reported by @Faelean in #25 (comment)

SimpleRTF2HTMLConverter inserts too many <br/> tags

Not sure what is the purpose of the line 118 in SimpleRTF2HTMLConverter#fetchHtmlSection() :
html = html.replaceAll("[\\n\\r]+", " <br/> ");
However this results in whole lot of extra <br/> tags. And when trying to send an email with such HTML it looks awful with lots of extra lines. However when I replaced <br/> back with a newline \n and sent the email, it looked just like the original.
I tried this on about 10 different emails of various complexity and this replacement of newline with <br/> broke all of them completely, while removing this line fixed them to be just like the originals.

Handling signed messages (smime.p7m attachments)

Hi Benny,
I wanted to ask you if this library manages the signed Outlook message files.
Using the msgparser library I see that this message has an IPM.Note.SMIME class and that there is only one attachment called smime.p7m
This message has no body, but only an attachment
Is this library able to interpret the body and the attachments contained in the attachment smime?
Outlook Signed Message.zip
Thanks 4 Y time
Alex

NPE on ClientSubmitTime when original message has not been sent yet

Copied from bbottema/simple-java-mail#243

While investigating bbottema/outlook-message-parser#23 we found an issue with the OutlookMessage.getClientSubmitTime() method.

java.lang.NullPointerException
	at org.simplejavamail.outlookmessageparser.model.OutlookMessage.getClientSubmitTime(OutlookMessage.java:849)
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.buildEmailFromOutlookMessage(OutlookEmailConverter.java:73)
	at org.simplejavamail.internal.outlooksupport.converter.OutlookEmailConverter.outlookMsgToEmailBuilder(OutlookEmailConverter.java:60)
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmailBuilder(EmailConverter.java:226)
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:209)
	at org.simplejavamail.converter.EmailConverter.outlookMsgToEmail(EmailConverter.java:201)

The issue is present with the following test emails from the outlook-message-parser.

  • tst_unicode.msg
  • unsent draft.msg
  • chinese message.msg
  • forward with attachments and embedded images.msg
  • the nested mail from 'nested simple mail.msg'

java.lang.NumberFormatException: For input string: "101f-00000001"

I am having trouble with some stored .msg-files I want to load/parse.

final OutlookMessageParser parser = new OutlookMessageParser();

try{
    parser.parseMsg("somefile.msg");
} catch(Throwable ex){
    // NO-OP please ignore, this is example code, you know
}

When trying to load the file I get this stacktrace:

23:28:23.790 [main] INFO org.simplejavamail.outlookmessageparser.OutlookMessageParser - Could not parse directory entry __substg1.0_8009101F-00000001
java.lang.NumberFormatException: For input string: "101f-00000001"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.analyzeDocumentEntry(OutlookMessageParser.java:577)
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.getMessagePropertyFromDocumentEntry(OutlookMessageParser.java:395)
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryDocumentEntry(OutlookMessageParser.java:258)
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.checkDirectoryEntry(OutlookMessageParser.java:204)
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:129)
	at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:98)

Using some example-files (https://github.com/bbottema/outlook-message-parser/tree/master/src/test/resources/test-messages) provided by this project it works. I will try to get some example file ready, but as this is some kind of paid-work stuff, this will take some time to provide a proper .msg-file for debugging.

Wrong encoding for bodyHTML

If an email contains bodyHTML (mapi 0x1013) that is encoded in for example UTF-8 the parser ignores the encoding and uses CP1252 causing characters like ü being displayed as ü.

private String convertValueToString(final Object value) {
if (value == null) {
return null;
}
if (value instanceof String) {
return (String) value;
} else if (value instanceof byte[]) {
return new String((byte[]) value, CharsetHelper.WINDOWS_CHARSET);
} else {
LOGGER.trace("Unexpected body class: {} (expected String or byte[])", value.getClass().getName());
return value.toString();
}
}

Problem is that the correct charset is not known when calling the String constructor. There might be a way to do this more efficient but this is what we've come up with to replace Line 259:

String convertedString = new String((byte[]) value, CharsetHelper.WINDOWS_CHARSET);
Pattern pattern = Pattern.compile("charset=(\"|)([\\w\\-]+)\\1", Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(convertedString);
if(m.find()) {
	try {
		convertedString = new String((byte[]) value, Charset.forName(m.group(2)));
	} catch (Exception e) {
		//ignore and use default charset
	}
}
return convertedString;

First step, convert everything as before.
Second step, check the result String for a charset. The regex matches the following two pattern and extracts the charset:

<meta charset="utf-8" /> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

If there is a charset in the result String overwrite while using the correct charset, else use the already created String. The try/catch block is for the Charset.forName method in case someone messed up the charset in the bodyHTML.

Encoding - Characters converted to UTF8-hex

Version 1.4.0
Original text in .msg body:

Char-å-Char

Char-Å-Char

Char-ø-Char

Char-Ø-Char

Char-æ-Char

Char-Æ-Char

After calling parseMsg on the file and looking into BodyHTML & ConvertedBodyHTML of OutlookMessage, both values are null. The BodyRtf has now these values, but the characters are changed to UTF-8hex and the body in the return string contains the following:

Char-'c3'a5-Char

Char-'c3'85-Char

Char-'c3'b8-Char

Char-'c3'98-Char

Char-'c3'a6-Char

Char-'c3'86-Char

What is not displayed above is that before ' there is also a backslash \

If I try to convert this extracted rtf from .msg using the recently forked library "rtf-to-html" with (RTF2HTMLConverterRFCCompliant or RTF2HTMLConverterClassic)then I get the following exception:

Exception in thread "main" java.nio.charset.UnsupportedCharsetException: 65001
at org.bbottema.rtftohtml.impl.util.CharsetHelper.findCharset(CharsetHelper.java:19)
at org.bbottema.rtftohtml.impl.RTF2HTMLConverterRFCCompliant.rtf2html(RTF2HTMLConverterRFCCompliant.java:112)

If I use RTF2HTMLConverterJEditorPane, I am able to convert the rtf to html, but the result contains some encoding issues, so to partially solve them, I first convert the string of "Cp1252" to byte array and then the byte array to "UTF-8" String. After this I get almost all the results I wanted to achieve:

Char-å-Char

Char-Å-Char

Char-ø-Char

Char-#-Char

Char-æ-Char

Char-Æ-Char

As you can see I am able to convert most of the characters to correct encoding except the Ø character.

My current solution is to go back to version 1.1.16, retrieving the ConvertedBodyHTML as in this version it is not null and converting this string of html from "Cp1252" to byte array and then the byte array to "UTF-8" string. This way I don't use the newly forked"rtf-to-html" and am able to get html from OutlookMessageParser itself.

Is there some other workaround to make the newest version of Outlook-Message-Parser work?

build error

I am getting this error with mvn compile


Downloaded: https://repo.maven.apache.org/maven2/org/ow2/asm/asm-analysis/6.0/asm-analysis-6.0.jar (31 KB at 891.0 KB/sec)
Downloaded: https://repo.maven.apache.org/maven2/org/ow2/asm/asm-util/6.0/asm-util-6.0.jar (70 KB at 1738.8 KB/sec)
Downloaded: https://repo.maven.apache.org/maven2/org/jacoco/org.jacoco.report/0.8.1/org.jacoco.report-0.8.1.jar (126 KB at 2910.6 KB/sec)
Downloaded: https://repo.maven.apache.org/maven2/xerces/xercesImpl/2.8.1/xercesImpl-2.8.1.jar (1185 KB at 2686.0 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18.615s
[INFO] Finished at: Sun May 03 14:23:31 CEST 2020
[INFO] Final Memory: 16M/38M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.jacoco:jacoco-maven-plugin:0.8.1:prepare-agent (default-prepare-agent) on project outlook-message-parser: Unable to parse configuration of mojo org.jacoco:jacoco-maven-plugin:0.8.1:prepare-agent for parameter excludes: Cannot assign configuration entry 'excludes' with value '*' of type java.lang.String to property of type java.util.List -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginConfigurationException

HTML start tags with extra space not handled correctly

SimpleRTF2HTMLConverter.HTML_START_TAGS are:
private static final String[] HTML_START_TAGS = { "<html ", "<Html ", "<HTML " };
and should be
private static final String[] HTML_START_TAGS = { "<html", "<Html", "<HTML" };

Because of this fetchHtmlSection() method add extra <html> tags for some emails which have simple opening html tag without any attributes

Issues with certain unicode characters

I have been trying to use the outlook-message-parser and have found certain issues when using the following unicode characters

Hyphen (U+2010)
U+2013 : EN DASH
U+2019 : RIGHT SINGLE QUOTATION MARK {single comma quotation mark}
U+2022 : BULLET {black small circle}

they are not displayed properly, when using getBodyHTML();

Do you know if a fix is available? I have tried with the version 1.7.13.
Thanks

Add support for parsing RTF email messages

As disscussed in #15 there are Outlook msg files that have only RTF body, which were created from RTF directly, not from HTML (you can create such email in Outlook by selecting FORMAT TEXT tab -> Format section -> Rich Text when creating a new message). Current parser doesn't parse such emails even closely to something readable.

To support this we need a generic RTF parser, which can parse generic RTF file and then convert it to HTML. It should handle handle all RTF formatting like \pard\plain \f0\b and convert it to HTML tags (like <div>, <span>, etc.) and style attributes (like font-size, font-family, etc.)
Probably we can combine current parser and generic one written by kschroeer/rtf-html-java.

Inline/Embedded images are not being obtained as embedded images

It works for the file provided in test but, does not works for the files i tried to test with

OutlookMessage outlookMessage = new OutlookMessageParser().parseMsg(new File("example.msg")); 
System.out.println(outlookMessage.fetchCIDMap().keySet().size()); // this is zero
System.out.println(outlookMessage.fetchTrueAttachments().size()); // this is 1

expecting CIDMap keyset size to 1 as the file has inline image

Sample email to reproduce the issue: https://ufile.io/1yztx (link is valid for 30 days only)

PLMK if you are able to reproduce it or if you need any other information.

Missing attachments & embedded images in signed messages

I don't know if this might be related to #1 (issue doesn't mention signed mails and it works with non signed mails) and #4 (based on the issue and the documentation it should work).

I've sent an S/MIME signed (not encrypted) mail with one attachment and one embedded image to two different mail adresses. One was saved as an msg file from Outlook and one as an eml file from the Gmail web interface. Then I've parsed them using this code block (slightly modified from the documentation) and printed the information about attachments and S/MIME details:

public static void main(String[] args) throws IOException {
		String emlFileName = ".\\files\\testSigned2.eml";
		String msgFileName = ".\\files\\testSigned2.msg";
		
		try (FileInputStream fileInputStream = new FileInputStream(emlFileName)) {
			
			Email email = EmailConverter.emlToEmail(fileInputStream);
			
			printResult(email, emlFileName);
			
		}
		
		try (FileInputStream fileInputStream = new FileInputStream(msgFileName)) {
			
			Email email = EmailConverter.outlookMsgToEmail(fileInputStream);
			
			printResult(email, msgFileName);
			
		}
	}
	
	private static void printResult(Email email, String fileName) {
		System.out.println("-----" + fileName + "-----");
		System.out.println("---Attachments---");
		for(AttachmentResource attachmentResource: email.getAttachments()) {
			System.out.println(attachmentResource.getName());
		}
		
		System.out.println("---Decrypted Attachments---");
		for(AttachmentResource attachmentResource: email.getDecryptedAttachments()) {
			System.out.println(attachmentResource.getName());
		}
		
		System.out.println("---Embedded Images---");
		for(AttachmentResource attachmentResource: email.getEmbeddedImages()) {
			System.out.println(attachmentResource.getName());
		}
		
		System.out.println("---S/MIME Details---");
		OriginalSmimeDetails details = email.getOriginalSmimeDetails();
		System.out.println("Mode: " + details.getSmimeMode()); // SIGNED
		System.out.println("Mime: " + details.getSmimeMime()); // application/pkcs7-mime or multipart/signed
		System.out.println("Type: " + details.getSmimeType()); // signed-data, enveloped-data
		System.out.println("Name: " + details.getSmimeName()); // smime.p7m or smime.p7s
		System.out.println("Micalg: " + details.getSmimeMicalg()); // ie. sha-512
		System.out.println("SignedBy: " + details.getSmimeSignedBy()); // email or name used
		
		System.out.println("");
	}
}

The results I get show that the msg file is missing the attached image, the embedded image and basically all S/MIME information:

-----.\files\testSigned2.eml-----
---Attachments---
gettyimages-1254246621-1.jpg
smime.p7s
---Decrypted Attachments---
gettyimages-1254246621-1.jpg
smime.p7s
---Embedded Images---
[email protected]
---S/MIME Details---
Mode: SIGNED
Mime: multipart/signed
Type: null
Name: null
Micalg: SHA1
SignedBy: [email protected]

-----.\files\testSigned2.msg-----
---Attachments---
smime.p7m
---Decrypted Attachments---
smime.p7m
---Embedded Images---
---S/MIME Details---
Mode: PLAIN
Mime: null
Type: null
Name: null
Micalg: null
SignedBy: null

What I've noticed is when I extract the p7m file from the msg both images are Base64 encoded inside the file together with most of the S/MIME Details (couldn't verify SignedBy, but multipart/signed and SHA1 are there). Also looking at the p7m file, the first lines are:

MIME-Version: 1.0
Content-Type: multipart/signed;
	protocol="application/x-pkcs7-signature";
	micalg=SHA1;
	boundary="----=_NextPart_000_003F_01D70933.15C7D9C0"

Seeing that protocol is present I would have expected to not run into this issue (bbottema/simple-java-mail#292) when using 6.4.5 instead of 6.5.0 but the NullPointerException is still there, so there might be some problem reading the information.

I've attached both used mails:
signedMails.zip

getBodyHTML() returns null for a specific msg file

I have a bunch of msg files and for the most part, the OutlookMessageParser is working great! However, there is one file that is not working for some reason. The getBodyHTML() is returning null. I can open the message using Outlook and it looks fine. I am hesitant to attach the msg because it contains proprietary information but I would be more than happy to email it to you or share it with you some other way.

Chinese .msg HTML converted from RTF is garbled. Missing chinese (DBCS) encoding support.

o???á?<span lang=EN-US>   <o:p></o:p></span>  </span>  </span>    </p></td>  <span style='mso-bookmark:_MailAutoSig'>  </span>  <td width=142 nowrap style='width:106.3pt;border-top:none;border-left:none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt;padding:0cm 5.4pt 0cm 5.4pt;height:14.25pt'>  <p class=MsoNormal align=left style='text-align:left'>   <span style='mso-bookmark:_MailAutoSig'>  <span style='font-size:11.0pt;line-height:110%;color:black'>  è?ìì<span lang=EN-US>   <o:p></o:p></span>  </span>  </span>    </p></td>  <span style='mso-bookmark:_MailAutoSig'>  </span>  </tr></table>  <p class=MsoNormal style='line-height:normal;mso-pagination:none'>  <span style='mso-bookmark:_MailAutoSig'>  <span lang=EN-US style='mso-bidi-font-size:10.5pt;font-family:"?¢èí??oú",sans-serif;mso-no-proof:yes'>    <o:p>  </o:p></span>  </span>  </p><p class=MsoNormal style='line-height:normal;mso-pagination:none'>  <span style='mso-bookmark:_MailAutoSig'>  <span lang=EN-US style='mso-bidi-font-size:10.5pt;font-family:"?¢èí??oú",sans-serif;mso-no-proof:yes'>    <o:p>  </o:p></span>  </span>  </p><p class=MsoNormal style='line-height:normal;mso-pagination:none'>  <span style='mso-bookmark:_MailAutoSig'>  <span lang=EN-US style='mso-bidi-font-size:10.5pt;font-family:"?¢èí??oú",sans-serif;mso-no-proof:yes'>    Best Regards<o:p></o:p></span>  </span>  </p><p class=MsoNormal style='line-height:normal;mso-pagination:none'>  <span style='mso-bookmark:_MailAutoSig'>  <span lang=EN-US style='mso-bidi-font-size:10.5pt;font-family:"?¢èí??oú",sans-serif;mso-no-proof:yes'>

chinese msg is garbled, this can not handler chinese msg of outlook.

I want to convert msg to html.

Bug: __nameid_ directory should not be parsed (and causing invalid HTML body)

As discussed in PR #22, see also nameid-fix branch:

In a MSG file the _nameid... directory represents named properties, which are as of right now not supported by this library. It tries to parse the entries in this directory using the normal property parser though, which creates conflicts and is wrong. You can also refer to the Microsoft documentation for this: [MS-OXMSG].

I had several emails where an id in this directory collided with the property id for HTML-Content (10130102), which caused RTF-Emails to incorrctly report this invalid content as the HTML body, which prevented the RTF Conversion from being read.

grafik

Missing object OutlookMessageAssert

I downloded the sources from repository but it appears that a class is missing:
org.simplejavamail.outlookmessageparser.model.OutlookMessageAssert

So the tests in HighoverEmailsTest cannot be executed
This same class is missing in release 1.1.17
screenhunter 350

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.