mvz / email-outlook-message-perl Goto Github PK
View Code? Open in Web Editor NEWEmail::Outlook::Message Perl module for reading Outlook .msg files
Home Page: http://www.matijs.net/software/msgconv/
Email::Outlook::Message Perl module for reading Outlook .msg files
Home Page: http://www.matijs.net/software/msgconv/
NAME Email::Outlook::Message DESCRIPTION This module reads e-mail messages stored as .msg files (such as generated by Outlook), and converts them to Email::MIME objects. It also includes a command-line interface in the form of the msgconvert script. You do not need Outlook installed to use this module. VERSION 0.921 INSTALLATION To install this module type the following: perl Build.PL ./Build ./Build test ./Build install You may have to become root for that final step. DEPENDENCIES This module requires these other modules: Carp Encode Getopt::Long IO::String Pod::Usage Email::MIME - 1.923 or later Email::MIME::ContentType - 1.014 or later Email::Sender - 1.3 or later Email::Simple - 2.206 or later OLE::Storage_Lite - 0.14 or later For testing: IO::All Test::More COPYRIGHT AND LICENCE Copyright 2002--2020 Matijs van Zuijlen. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Had this error:
Wide character in print at /usr/bin/msgconvert line 58.
I'll get this error on Ubuntu when trying to convert some emails.
Wide character at /usr/local/share/perl/5.30.0/Email/Outlook/Message.pm line 397.
The attached archive contains .msg files with HTML bodies encoded in base64.
The library fails to convert them correctly to .eml messages.
The docs don't appear to specify it does but I'm wondering is it possible with this tool (or the same libraries) to take an eml file and convert it to .msg file. I'm aiming to use this as part of an apache nifi workflow on a linux host
While working on packaging msgconvert for Guix, I noticed that one of the tests is failing:
# Failed test 'Checking if body structure for t/files/plain_jpeg_attached.msg is the same'
# at t/full_structure.t line 19.
# Structures begin differing at:
# $got->[4][1][0] = 'content-disposition: attachment; filename="test.jpg"'
# $expected->[4][1][0] = 'content-disposition: attachment; filename=test.jpg'
# Looks like you failed 1 test of 12.
t/full_structure.t .......
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/12 subtests
t/gpg_signed.t ........... ok
t/internals.t ............ ok
t/plain_jpeg_attached.t .. ok
t/plain_uc_unsent.t ...... ok
t/plain_uc_wc_unsent.t ... ok
t/plain_unsent.t ......... ok
t/pod_coverage.t ......... skipped: Test::Pod::Coverage required for testing pod coverage
Test Summary Report
-------------------
t/full_structure.t (Wstat: 256 Tests: 12 Failed: 1)
Failed test: 5
Non-zero exit status: 1
Files=10, Tests=118, 1 wallclock secs ( 0.04 usr 0.02 sys + 0.83 cusr 0.51 csys = 1.40 CPU)
Result: FAIL
Failed 1/10 test programs. 1/118 subtests failed.
command "./Build" "test" failed with status 255
Please let me know what additional details I can provide about my environment.
Thanks!
Jack
Hello,
I have received this error using msgconvert:
% msgconvert --mbox spam FW_chyba_VUK_pri_UZ.msg
Unknown encoding 'CP28592' at /usr/share/perl5/Email/Outlook/Message.pm line 399
adding the following to $MAP_CODEPAGE helped:
28592 => 'ISO-8859-2',
Conversion to .eml in Windows 10 (v5.24.1, MSWin32-x64-multi-thread, Strawberry Perl inside Cygwin) adds a form of "newline" to each line (not just base64). As a result (?) attachments are not recognised as files by email client, and base64 is simply inlined into the message body.
---mbox generation unaffected.
Under Linux conversion to .eml of same .msg succeeds as expected.
Example message body snippet where it transitions to the attachment, whitespace retained:
==========[SNIP BEGINS]===================================
<p class=MsoNormal><font size=3 face="Times New Roman"><span lang=EN-US
style='font-size:12.0pt'><o:p> </o:p></span></font></p>
</div>
</body>
</html>
--14901086630.FEb4.8904
Content-Type: application/rtf
Content-Disposition: inline
Content-Transfer-Encoding: base64
e1xydGYxXGFuc2lcYW5zaWNwZzEyNTJcZnJvbWh0bWwxIFxkZWZmMHtcZm9udHRibAoNe1xmMFxm
c3dpc3MgQXJpYWw7fQoNe1xmMVxmbW9kZXJuIENvdXJpZXIgTmV3O30KDXtcZjJcZm5pbFxmY2hh
cnNldDIgU3ltYm9sO30KDXtcZjNcZm1vZGVyblxmY2hhcnNldDAgQ291cmllciBOZXc7fQoNe1xm
NFxmc3dpc3NcZmNoYXJzZXQwIEFyaWFsO30KDXtcZjVcZnN3aXNzXGZjaGFyc2V0MCBUaW1lcyBO
ZXcgUm9tYW47fQoNe1xmNlxmbmlsXGZjaGFyc2V0MiBXaW5nZGluZ3M7fQoNe1xmN1xmbmlsXGZj
==========[SNIP ENDS]===================================
Hello!
Thanks for this nice tool.
We have a centos distribution and installation via cpan -i Email::Outlook::Message
works flawlessly.
However, we need to execute msgconvert
from a web application which is run as apache
(or www-run).
I'm not able to execute msgconvert
from there and it seems quite tricky to get this working.
Any tips or considerations how to approach this problem?
Do you know what format use Microsoft to store its mails in OST/PST files? HTML/EML/MESSAGE/etc.
hello,
msgconvert writes files in dos format. It would be great if it could strip CR characters on unix systems.
The GitHub Action that runs the tests should cache the installation of dependencies.
Forwarding https://bugs.debian.org/801189
Version: 0.918-1
File: /usr/bin/msgconvert
I attempted to convert a mail containing plain text and HTML variants
but msgconvert only kept the plain text variant, discarding the HTML
variant. It would be nice if it could keep both of them.
pabs@chianamo ~ $ msgconvert --verbose path/to/outlook.msg
Skipping DIR entry __nameid_version1 0 (Introductory stuff)
...
Skipping property 001F:8004 (UNKNOWN): multipart/mixed; boundary="_009_3C5F9D52E ...
...
Using property 001F:1000 (BODY_PLAIN): ...
...
macOS Version: 11.0 Beta 20A4299v
perl version: v5.28.2
cpan version: /usr/bin/cpan script version 1.67, CPAN.pm version 2.20
Installed with cpan -i Email::Outlook::Message
Command failure output is as follows.
Loading internal logger. Log::Log4perl recommended for better logging
Reading '/Users/nep/.cpan/Metadata'
Database was generated on Sat, 18 Jul 2020 20:17:03 GMT
Running install for module 'Email::Outlook::Message'
Checksum for /Users/nep/.cpan/sources/authors/id/M/MV/MVZ/Email-Outlook-Message-0.919.tar.gz ok
'YAML' not installed, will not store persistent state
Configuring M/MV/MVZ/Email-Outlook-Message-0.919.tar.gz with Build.PL
Created MYMETA.yml and MYMETA.json
Creating new 'Build' script for 'Email-Outlook-Message' version '0.919'
MVZ/Email-Outlook-Message-0.919.tar.gz
/usr/bin/perl Build.PL -- OK
Running Build for M/MV/MVZ/Email-Outlook-Message-0.919.tar.gz
Building Email-Outlook-Message
MVZ/Email-Outlook-Message-0.919.tar.gz
./Build -- OK
Running Build test
t/basics.t ............... ok
t/gpg_signed.t ........... ok
t/internals.t ............ 1/17
# Failed test at t/internals.t line 66.
# got: 'text/plain; charset=UTF-8'
# expected: 'text/plain; charset="UTF-8"'
# Looks like you failed 1 test of 17.
t/internals.t ............ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/17 subtests
t/plain_jpeg_attached.t .. 1/23
# Failed test 'Testing content disposition'
# at t/plain_jpeg_attached.t line 38.
# got: 'attachment; filename=test.jpg'
# expected: 'attachment; filename="test.jpg"'
# Looks like you failed 1 test of 23.
t/plain_jpeg_attached.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/23 subtests
t/plain_uc_unsent.t ...... ok
t/plain_uc_wc_unsent.t ... ok
t/plain_unsent.t ......... ok
t/pod_coverage.t ......... skipped: Test::Pod::Coverage required for testing pod coverage
Test Summary Report
-------------------
t/internals.t (Wstat: 256 Tests: 17 Failed: 1)
Failed test: 9
Non-zero exit status: 1
t/plain_jpeg_attached.t (Wstat: 256 Tests: 23 Failed: 1)
Failed test: 21
Non-zero exit status: 1
Files=8, Tests=87, 1 wallclock secs ( 0.03 usr 0.02 sys + 0.76 cusr 0.25 csys = 1.06 CPU)
Result: FAIL
Failed 2/8 test programs. 2/87 subtests failed.
MVZ/Email-Outlook-Message-0.919.tar.gz
./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
reports MVZ/Email-Outlook-Message-0.919.tar.gz
I tested msgconvert recently and got a quite well formated .eml file.
But I notice this multipart header in the resulting .eml :
--16153750240.8B52dA.31571
Content-Type: text/plain; charset="UTF8"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
Another software (proprietary one) displaying the mail complain about the charset name UTF8. In the IANA charset list (http://www.iana.org/assignments/character-sets/character-sets.xhtml), I can see "UTF-8" and the alias "csUTF8", but no "UTF8".
Could it be possible to modify the produced charset to "UTF-8" instead of "UTF8" ?
the "From " lines as mbox file separators are generated from the fill From: name, it should be just addresses. for example, the header:
From: =?UTF-8?Q?Kamil?= [email protected]
is converted to this separatos:
From Kamil [email protected] Wed Jan 2 16:34:35 2019
where it should be just:
From [email protected] Wed Jan 2 16:34:35 2019
I was processing an Office 365 Individual Message export and it created some msg files with leading spaces in the filename. (Since the filenames are based on the message Subject, this is understandable.)
I modified my local copy of message.pm for line 114 so that it used the filehandle object way as given in the second example of the synopis at http://search.cpan.org/~jmcnamara/OLE-Storage_Lite-0.19/lib/OLE/Storage_Lite.pm
--- Message.pm.orig 2015-07-04 19:20:08.000000000 -0400
+++ Message.pm 2018-01-04 14:57:08.352770214 -0500
@@ -111,7 +111,13 @@
$self->{EMBEDDED} = 0;
- my $msg = OLE::Storage_Lite->new($file);
+ use IO::File;
+ my $oIo = new IO::File;
+ $oIo->open("$file", "r" ,0666);
+ binmode($oIo);
+ my $msg = OLE::Storage_Lite->new($oIo);
+
+ #my $msg = OLE::Storage_Lite->new($file);
my $pps = $msg->getPpsTree(1);
$pps or croak "Parsing $file as OLE file failed";
$self->_set_verbosity($verbose);
and it solved the problem for me (or at least I not longer got the croak error message.
Only the header of the messages gets stored in the mbox, not the body.
Working on #14 brought to light that reading text properties results in a different type of result depending on whether the property type is PT_STRING8
or PT_UNICODE
. In the latter case, the result is a Perl string (a sequence of Unicode code points), while in the former case, it is a sequence of bytes.
After reading the property, the knowledge of how it was encoded is discarded, so subsequent code needs to either guess at the data type of the property value, or just ignore it.
It would be better to keep the full knowledge of underlying data and type until the property is used. How it is to be decoded in the case of PT_STRING8
depends both on which property it is, and on the value of the PidTagInternetCodepage and PidTagMessageCodepage properties.
It is useful to run it in Docker because we can run without considering OS/Middleware/Language.
If you'd like, you can merge the branch.
I'm developing some Python CLI scripts which parse Office 365 emails for deployment on Windows 10 machines using Anaconda/Miniconda.
Your site gives install instructions for Linux, but not Windows.
What am I missing? Can you orient me to how this package might relate to development on Windows?
input: .msg saved by outlook in cp1251
output:
-headers: good
-attachments: good
-message body: encoding is broken
a converted file itself seams to be in utf-8
have something like:
--16855398770.0877C.31779
Content-Type: text/plain; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Óâàæàåìûé ...!
Îçíàêîìüòåñü ñ ... ... ...
need to be like (from the same message re-exported in unicode):
--16855410730.aC1d0Ed.5308
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
Уважаемый ...!
Ознакомьтесь с ... ... ...
based on this decoder - https://2cyr.com/decode/?lang=en
source encoding:
WINDOWS-1251
displayed as:
WINDOWS-1252
Email::Outlook::Message version: 0.921
Thank you for providing this software.
I have been unable to convert a specific email which I can provide if absolutely necessary (company email).
After trying around and recognizing it works in an ubuntu WSL (version 0.919), i tried to install the old version on manjaro (0.919) which successfully converts the mail. There the last output is "Wide character in print at /usr/bin/vendor_perl/msgconvert line 58.", but still the email is converted and on first sight includes everything important.
Environments where it doesn't work with 0.920:
Linux 5.9.16-1-MANJARO x86_64 GNU/Linux, AUR package "perl-email-outlook-message"
Alpine based image on same machine, with following packages installed:
apk add --no-cache --virtual .build-deps \
perl-utils \
perl-module-build \
perl-app-cpanminus
apk add perl-email-address-xs perl-doc perl-params-util
cpanm Email::Outlook::Message
In the alpine container, the error message is exactly the same. After downgrading to version 0.919 it also works, but with slightly different "Wide character in print at /usr/bin/msgconvert line 62." as final output.
When I compare the generated .eml file with what Outlook sent to the Internet I see that the .eml file contains the unmodified application/rtf part from the origin .msg file. After importing the message to Thunderbird only the plain text part is shown.
However, if I import the .msg file to Outlook and forward it to the Internet the received mail contains a text/html part and Thunderbird can show the message in its full glory.
I tried to extract the rtf part and convert it with RTF::HTML::Converter
, but all the styles are lost. Do you know a better way?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.