inflex / ripmime Goto Github PK

View Code? Open in Web Editor NEW

69.0 12.0 26.0 266 KB

MIME/email package decoder

License: BSD 3-Clause "New" or "Revised" License

C 98.68% Makefile 0.46% Shell 0.08% Roff 0.78%

ripmime's Introduction

ripMIME - email attachment extractor

Features

Extracts all attachments even from multiple MUA personalities

TODO

Update to modern coding standards (last mostly done 2009)

Installation

Clone the project

$ git clone https://github.com/inflex/ripMIME.git
Build it!

$ make

Usage

Discussion

IRC Freenode: message inflex

ripmime's People

Contributors

Stargazers

Watchers

ripmime's Issues

Unable to extract attachment

Hello!

I am trying to extract an attachment from an email received from a Toshiba MF copier and I getting the header the file and a textfile0 with the following content:

Scanned from Toshiba2555CSE
Template test
Test scan
Date:23/03/2020 08:40
Pages:1
Resolution:300x300 DPI

Scanned on TOSHIBA2555CSE
No reply.
M1�é¯|ÓW5á§4ãM�ñ¿vïÍwÓ�|Ó}�
‰íz{SÊ—š¦™bq«b¢�éuð¨ž×§µ:ÚžÇÞ¬IÜ¡Ø§�¶¬{®�¢{^žÐâ²š,ŠØ¨�«miÈfz{_ŠW§jg

The email is displayed correctly in a email client like Outlook and I can see the attachment, but with ripmime I cannot extract it. I am using the latest version 1.4.0.10.

If anyone can give me any pointers on how to tackle this, will be greatly appreciated.

Thanks,
Chris

Would it be possible to provide an option to name attachments by their content id in place of, --name-by-type?
We have a process where we are using ripmime to take an maildir email and replace the embedded CID images with URL, so the email when loaded from the database does not have any of the document, or inline images as part of the data. This has been working quite well, until we run into email messages which do not name the content, which makes it difficult to link the decoded image/attachment back to the Content-ID in the header.

Right now when I use --name-by-type option, I get a sample content header block:

--=_related 00633EA78625828A_=
Content-Type: image/gif
Content-ID: <_1_0C814D9C0C81499C00633EA78625828A>
Content-Transfer-Encoding: base64

With actual filename that is extracted as: image-gif3

If the Content-ID has a name such as:

Content-Type: image/gif; name="image001.gif"
Content-ID: <_1_0C814D9C0C81499C00633EA78625828A>
Content-Transfer-Encoding: base64

It will be named image001.gif and we can find it, however if option --name-by-contentid was used, it would write out _1_0C814D9C0C81499C00633EA78625828A in place.

ripmime and Almalinux 9

Hello

unlucky ripmime does not install over Almalinux 9. Anyone is able to fix this ?

Thank you
Graziano

Stdout instead directory operations

Hello, @inflex! Thanks for wonderful utility!

What about unix-way standard output of file content instead IO file operations in a directory?

Also this need yet existed functions in other combinations

option for recursive counting of included file attachments
option for listing of included file attachments with number, so called transport path, Content-ID, size and filename. For example with | or \t delimiter
option for filename output by number

My idea in a script where initial $ is a command prompt

$ripmime -i test.eml --count
4
$ripmime -i test.eml --list-attachments
1|0||42148|
2|1|499C00633EA78625828B|98246|incl.eml
3|1-0|499C00633EA78625827A|8934|rfc.txt
4|1-1||35491|image.gif
$ripmime -i test.eml --list-attachment 2
2|1|499C00633EA78625828B|incl.eml
$ripmime -i test.eml --get-filename 1

$ripmime -i test.eml --get-filename 2
incl.eml
$ripmime -i test.eml --get-filename 3
rfc.txt
$ripmime -i test.eml --get-filename 4
image.gif
$fn=$(ripmime -i test.eml --get-filename 4);
$ripmime -i test.eml --get-content 4 > "$fn";

This options allows to solve problems described in #5 #8 #11 #10 #18 and partially #14 and #20.

Do you need any my help with code or testing @inflex ? My PRs in C see in my profile, I am not very productive in C.

Endless loop

I found that an email with attachments that have the following headers makes ripmime be in an infinite loop ( well I gave after one minute ).

((TAB) represents a \t

Content-Type: application/pdf; name="(PDI TX) =?UTF-8?B?4oCTIE1BIC0gRU5MQUNF
(TAB)IFNUSS1QSU0g4oCTIDIwMTcgLSBOT1ZPIOKAkyA=?=
=?UTF-8?B?U0lBRSAtIEFHUzIwIOKAkyBCUi1WSVYwLTE3MDAyNzUg4oCTIFJFViAwMC4=?=
=?UTF-8?B?cGRm?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="(PDI TX) =?UTF-8?B?4oCTIE1BIC0gRU5MQUNFIFNUSS1QSU0g4oCTIDIw?=
=?UTF-8?B?MTcgLSBOT1ZPIOKAkyBTSUFFIC0gQUdTMjAg4oCTIEJSLVZJVjAtMTcwMDI=?=
=?UTF-8?B?NzUg4oCTIFJFViAwMC5wZGY=?=";

--no-nameless deletes valid attachments named textfile*

I was hoping to use ripmime to extract only the attachments of an email message while ignoring the plain-text/HTML body itself, so that I could pass the attachments to mraptor.

At first I considered to just delete all textfile* files after running ripmime, but this fails to handle a special case: a mail with the attachments foo.odt and textfile10 would be extracted like this:

host ~ # ripmime -i 885. -d x -v 
Decoding filename=textfile0
Decoding filename=textfile1
Decoding filename=foo.odt
Decoding filename=textfile10

When I now delete textfile*, I'd delete the valid attachment textfile10 too.

Then I discovered --no-nameless and I had hoped that it would correctly skip textfile0 and textfile1 (plain and HTML) while extracting textfile10, but unfortunately it falls for the same thing:

host ~ # ripmime -i 885. -d x -v --no-nameless
Decoding filename=foo.odt
Decoding filename=textfile10
Removed x/textfile10 [status = 0]
Removed x/textfile1 [status = 0]
Removed x/textfile0 [status = 0]

Package version

Hi,

Would it be possible for you to tag a package version for packaging purposes?

Thanks

Buffer Overflow

Not sure if this is any value or not but we are seeing fairly regular crashes on the CentOS release

$ ripmime -V
v1.4.0.9 - November 07, 2008 (C) PLDaniels http://www.pldaniels.com/ripmime

*** buffer overflow detected ***: ripmime terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x390df02877]
/lib64/libc.so.6[0x390df00760]
/lib64/libc.so.6[0x390deffe5b]
/lib64/libc.so.6(__snprintf_chk+0x7a)[0x390deffd2a]
ripmime[0x40910d]
ripmime[0x40ab6c]
ripmime[0x40b88f]
ripmime[0x405ddb]
ripmime[0x406485]
ripmime[0x406734]
ripmime[0x406fdc]
ripmime[0x401699]
ripmime[0x401744]
ripmime[0x402383]
/lib64/libc.so.6(__libc_start_main+0x100)[0x390de1ed20]
ripmime[0x401549]
======= Memory map: ========
00400000-00420000 r-xp 00000000 08:03 22155633 /usr/bin/ripmime
0061f000-00621000 rw-p 0001f000 08:03 22155633 /usr/bin/ripmime
00621000-00624000 rw-p 00000000 00:00 0
00820000-00821000 rw-p 00020000 08:03 22155633 /usr/bin/ripmime
021c3000-021e4000 rw-p 00000000 00:00 0 [heap]
390da00000-390da20000 r-xp 00000000 08:03 3801131 /lib64/ld-2.12.so
390dc20000-390dc21000 r--p 00020000 08:03 3801131 /lib64/ld-2.12.so
390dc21000-390dc22000 rw-p 00021000 08:03 3801131 /lib64/ld-2.12.so
390dc22000-390dc23000 rw-p 00000000 00:00 0
390de00000-390df8b000 r-xp 00000000 08:03 3801429 /lib64/libc-2.12.so
390df8b000-390e18a000 ---p 0018b000 08:03 3801429 /lib64/libc-2.12.so
390e18a000-390e18e000 r--p 0018a000 08:03 3801429 /lib64/libc-2.12.so
390e18e000-390e190000 rw-p 0018e000 08:03 3801429 /lib64/libc-2.12.so
390e190000-390e194000 rw-p 00000000 00:00 0
3910200000-3910216000 r-xp 00000000 08:03 3801503 /lib64/libgcc_s-4.4.7-20120601.so.1
3910216000-3910415000 ---p 00016000 08:03 3801503 /lib64/libgcc_s-4.4.7-20120601.so.1
3910415000-3910416000 rw-p 00015000 08:03 3801503 /lib64/libgcc_s-4.4.7-20120601.so.1
7f020a7e1000-7f020a7e4000 rw-p 00000000 00:00 0
7f020a7ed000-7f020a7f0000 rw-p 00000000 00:00 0
7ffe6dfbd000-7ffe6dfd2000 rw-p 00000000 00:00 0 [stack]
7ffe6dfe1000-7ffe6dfe2000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Fails on very long filenames

First of all, many thanks for ripmime!

I sometimes encounter files containing attachments with very long filenames. "Very long" means longer than 255 characters (on my Linux system). This is an error message I recently got:

mime.c:2337:MIME_generate_multiple_hardlink_filenames:WARNING: While trying to create '(...)' link to '(...)' (File name too long)

On non Linux filesystems (HFS, FAT, ...) the maximum filename length can even be less than 255 characters.

Suggestion for handling the problem:

Query the max. file name size with the help of function pathconf (see http://www.gnu.org/software/libc/manual/html_node/Pathconf.html, parameter _PC_NAME_MAX)
If a file is to be created with a name longer than that value: Shorten the filename to the max value minus some spare characters.
Those spare characters are needed if a file with that name already exists. In that case, renaming could be done according to the options --overwrite, --unique-names, --prefix, --postfix and --infix.
Optionally, the spare characters could include some string hinting that the filename was shortened and does not correspond to the original filename. For example: "AttachmentThatReallyHasSomeVeryLongFilename" would become "AttachmentThatReallyHasSomeVe_shortened".
If a filename needs to be shortened, ripmime could inform about it on stdout (the message would include the original filename) .
ripmime could even refuse to work on file systems whose maximum filename length is less then a certain value.

Thanks again for this great tool and best greetings!

Flag to suppress directory recursing?

When ripmime -i is given a directory name, it'll automatically recurse into that directory:

martin.mein-iserv.de ~ # mkdir foo
martin.mein-iserv.de ~ # touch foo/bar
martin.mein-iserv.de ~ # ripmime -i foo
input file is a directory, recursing
Unpacking mailpack foo/bar

This surprised us badly in a buggy shell script that used ripmime like this:

martin.mein-iserv.de ~ # cat test.sh
#!/bin/sh

for i in "$1"/*
do
  ripmime -i "$i" -d "/tmp/foo"
done

martin.mein-iserv.de ~ # ./test.sh                                                      
input file is a directory, recursing                                                    
Unpacking mailpack /bin/ab                                                              
Unpacking mailpack /bin/pidof                                                           
Unpacking mailpack /bin/ipmitool                                                        
Unpacking mailpack /bin/python3.9                                                       
Unpacking mailpack /bin/streamzip.bundled                                               
Unpacking mailpack /bin/uptftopl                                                        
Unpacking mailpack /bin/fincore        
Unpacking mailpack /bin/expect_tknewsbiff                                               
[...]

I feel like an option like --no-recurse to suppress this recursing behavior for situations where ripmime is only ever expected to process files might be useful to guard against such situations.

Content-Transfer-Encoding: binary is not decoded correctly

The problem is caused by the code in FFGET_getnewblock() that turns NUL bytes into spaces. The binary encoding of a binary file, such as an image, will most likely contain NUL bytes. Turning them into spaces will corrupt the file.

A workaround is to enable the undocumented --formdata option that disables the NUL to space conversion. However, it would be more correct to disable the conversion when processing the binary encoded data and turn it back on afterward.

output filename list only

Could you please add an option to only list the attachments? (so that we could use it for procmail, for eg.)
Thanks.

ripMIME for incomming mail

Hello
at the beginning - great job guys!!! really great!!!
I have only a one small question - is there a possibility to filter the incomming mail as it arrives?
How can this be processed/achieved?

attachments with spaces in filename lose at least their extension

Hi,
ripmime has already saved me hours on end with scans students send me via email. All works brilliantly as long as the attached file has no space in their filename.

Example:
attachment "MyPaper.pdf" will be extracted as "MyPaper.pdf", but
attachment "My Paper.pdf" will be extracted as "My".

I've automated my workflow and deal with the attachments according to their extensions, so whenever somebody sends me a file with a space, I end up having zero output. Would it be possible to adapt ripmime's handling of filenames to include spaces?

Thank you very much :D

Error code with failed command

When ripmime fails writing to an existing directory with insufficient permissions (e.g. missing x-bits) it prints an error message "mime.c:1484:MIME_decode_text:ERROR: cannot open out/textfile0 for writing" but still returns an exit code 0 to the shell which makes it hard to detect failure in scripts etc.

File renaming

Hi,

would it be possible to rename the attachment(s) on the fly?

Best regards, Marc

parameter continuations fail on whitespace

Well, this is a new MIME header encoding format for me...

$ ripmime -v --paranoid --overwrite -i Maildir/cur/1663650431.29200_997.hanzawa:2,S -d Workdir/1663650431.29200_997.hanzawa
Decoding filename=textfile0
Decoding filename=textfile1
Decoding filename=Entainlsx

The cause of this is:

------=_Part_50_1247050508.1663646525365
Content-Type: application/octet-stream;
        name*0="Entain Ladbrokes Coral Yahoo Past 7 days Report
 09-20-2022.x"; name*1=lsx
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename*0="Entain Ladbrokes Coral Yahoo Past 7 days Report
 09-20-2022.x"; filename*1=lsx
...

Looks like the parser does not support whitespace when dealing with parameter continuations.