Giter Site home page Giter Site logo

ansemjo / imapfetch Goto Github PK

View Code? Open in Web Editor NEW
17.0 4.0 5.0 291 KB

python script to download all emails from a mailserver and store them in a maildir format

License: MIT License

Python 88.19% Shell 3.57% Makefile 6.01% Dockerfile 2.23%
backup mailboxes imap maildir

imapfetch's Introduction

imapfetch

imapfetch.py is a relatively straighforward Python script to download all emails from an IMAP4 mailserver and store them locally in a simple, plaintext maildir format, e.g. for backup purposes.

New Version v1.0.0

The new development branch was finally merged into main. Unfortunately, it has a few incompatible changes, so I bumped the major version. You cannot use v1 with old archives created with v0, the new version will not know about the previously archived mails.

That being said, I think the changes are well worth the switch:

  • uses IMAPClient library for safer communication to the server
  • SHA224 header digests used in filename, as a sort of content-addressing scheme
  • completely rewritten logging, silent by default
  • switched index to a simple SQLite file

INSTALL

You can quickly install directly from GitHub with pip:

pip install git+https://github.com/ansemjo/imapfetch.git

For other options, see packaging below.

USAGE

Configure your accounts using the provided configuration sample and run:

imapfetch settings.cfg

Use --help to see a list of possible options:

imapfetch [-h] [--full] [--list] [--verbose] [--start-date START_DATE] [--end-date END_DATE] config [section ...]

The configuration file is passed as the first and only required positional argument. Any further positional arguments are section names from the configuration file, which will be run exclusively; for example if you want to archive only a single account at a time.

  • --list: Only show a list of folders for every account and exit. Useful to get an overview of your accounts before writing exclusion patterns or just checking connectivity.
  • --full: Perform a full backup by starting with UID 1 in every folder; only useful if the server returns inconsistent or not strictly monotonically increasing UIDs. Duplicate mails which are already in the index will not be downloaded either way.
  • --verbose: Show more verbose logging. Can be passed multiple times.
  • --start-date START_DATE: Start date for filtering messages (YYYY-MM-DD)
  • --end-date END_DATE: End date for filtering messages (YYYY-MM-DD)

CONFIGURATION

The available configuration options are mostly explained in the provided sample.

  • you add one [section] per account

  • archive points to the directory where you want to store the mails

  • server, username and password are the IMAP4 connection details

  • exclude is a multi-line string of UNIX-style globbing patterns to exclude folders from the backup; one pattern per line

  • quoting enables urlencoding of folder names before writing to disk; some systems will not handle all allowed inbox characters otherwise

Minimal required sample:

[myarchive]
archive     = ~/mailarchive
server      = imap.strato.de
username    = [email protected]
password    = verySecurePassword

RUNNING

During execution the archive directory is created if it does not exist and a simple index.db is created, which is an SQLite file.

For every backed up folder a subdirectory is created with the same name. Those subdirectories are maildir mailboxes and can be viewed with most email clients; for example mutt.

For every E-Mail in that folder, the header is downloaded and hashed. If the resulting digest is not present in the index, the rest of the email is downloaded and stored in the local maildir. This is done to detect duplicates and avoid storing a mail twice if it is moved between folders.

$ tree archive/ -L 2
archive/
├── INBOX
│   ├── cur
│   ├── new
│   └── tmp
├── index.db
├── muttrc
...

BACKUP

Once imapfetch is done you have a local copy of all your emails. This is a one-way operation, no emails that are deleted online are ever deleted in your archive. You can then backup that entire directory as-is with tools like borg or restic. Both do a fantastic job at deduplicating existing data, so you don't waste much space even if you take daily or even hourly snapshots.

If you are sufficiently sure that you will never have an inbox folder called backup on your mailserver you might do something like this:

borg init --encryption repokey-blake2 ./backup
borg create --stats --progress --compression zstd \
  ./backup::$(date --utc +%F-%H%M%S%Z) INBOX*/ index

VIEWING

Generally all applications that handle maildir mailboxes should be able to browse your archive. mutt is a nice terminal application that is able to handle these archives with an absolute minimum configuration. A sample is provided with this project: just copy the muttrc to your archive directory and run:

cd path/to/my/archive
mutt -F ./muttrc

Use c to change directories.

Generally, since maildir is a plaintext format, most commandline tools should work. grep is decently fast at finding specific emails, for example.

PACKAGING

Make sure you run a decently modern version of Python 3; anything newer than 3.5 should work but development was done on 3.10.

PIP

Since this project only uses a PEP 517 style pyproject.toml, you might have to update your pip. Install the package directly from GitHub as shown above or use a specific version archive:

pip install [--user] https://github.com/ansemjo/imapfetch/archive/v1.0.0.tar.gz

AUR

On Arch Linux install imapfetch-git with an AUR helper:

paru -S imapfetch-git

RPM, DEB, APK

Other packages can be built with ansemjo/fpm using assets/Makefile:

make -f assets/Makefile packages

These can then be installed locally with yum / dnf / dpkg etc.

CONTAINER

Automatic GitHub workflows regularly build a container image using assets/Dockerfile, which includes crond to run the script on a specific schedule easily. Run the container with docker or podman like this:

docker run -d \
  -v ~/.config/imapfetch.cfg:/imapfetch.cfg \
  -v ~/mailarchive:/archive \
  -e SCHEDULE="*/15 * * * *" \
  ghcr.io/ansemjo/imapfetch

Make sure the archive key in your configuration points to the directory as mounted inside the container.

LICENSE

The script is licensed under the MIT License.

imapfetch's People

Contributors

ansemjo avatar dependabot[bot] avatar patrykgruszka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

imapfetch's Issues

python 3.6 requirement

imapfetch.py requires at least Python 3.6 due to the use of f"{strings}". I should make it a requirement during packaging and installation with setup.py?

handle movement from new to cur better

Currently there is some logic that stores new messages into the maildir folder new and moves all existing messages in new into cur before each run. While this effect is negligible on later runs, it results in a very long-running second run if you archived many messages at once. To make matters worse, the message moving message ... to cur is classified as verbose and is thus not shown by default.

Storing mails in new is only useful if you want to check newly archived mails with mutt after each run.

  • remove the feature?
  • make it an option?
  • don't save in new on first runs?

Getting error selecting folder trying to backup a GMail / Google Suite mailbox

--list works:

<EMAIL ADDRESS>: INBOX
<EMAIL ADDRESS>: [Gmail]
<EMAIL ADDRESS>: [Gmail]/All Mail
<EMAIL ADDRESS>: [Gmail]/Bin
<EMAIL ADDRESS>: [Gmail]/Drafts
<EMAIL ADDRESS>: [Gmail]/Important
<EMAIL ADDRESS>: [Gmail]/Sent Mail
<EMAIL ADDRESS>: [Gmail]/Spam
<EMAIL ADDRESS>: [Gmail]/Starred

However if I try excluding INBOX* it fails with this error:

~$ .local/bin/imapfetch --full -vv settings.cfg
imapfetch: read configuration from settings.cfg
imapfetch: processing section <EMAIL ADDRESS>
<EMAIL ADDRESS>: connecting to imap.gmail.com
<EMAIL ADDRESS>: logging in as <EMAIL ADDRESS>
<EMAIL ADDRESS>: opened archive in /mnt/h/EmailArchive/<EMAIL ADDRESS>
<EMAIL ADDRESS>: excluded folder INBOX due to 'INBOX*'
<EMAIL ADDRESS>: processing folder [Gmail]
<EMAIL ADDRESS>: select failed: [NONEXISTENT] Unknown Mailbox: [Gmail] (Failure)
Traceback (most recent call last):
  File "/home/sape/.local/lib/python3.10/site-packages/imapfetch.py", line 342, in commandline
    mailserver.cd(folder)
  File "/home/sape/.local/lib/python3.10/site-packages/imapfetch.py", line 55, in cd
    return self.client.select_folder(folder, readonly=True)
  File "/home/sape/.local/lib/python3.10/site-packages/imapclient/imapclient.py", line 819, in select_folder
    self._command_and_check("select", self._normalise_folder(folder), readonly)
  File "/home/sape/.local/lib/python3.10/site-packages/imapclient/imapclient.py", line 1753, in _command_and_check
    self._checkok(command, typ, data)
  File "/home/sape/.local/lib/python3.10/site-packages/imapclient/imapclient.py", line 1759, in _checkok
    self._check_resp("OK", command, typ, data)
  File "/home/sape/.local/lib/python3.10/site-packages/imapclient/imapclient.py", line 1636, in _check_resp
    raise exceptions.IMAPClientError(
imaplib.IMAP4.error: select failed: [NONEXISTENT] Unknown Mailbox: [Gmail] (Failure)
imapfetch: encountered errors!
imapfetch: <EMAIL ADDRESS>: error('select failed: [NONEXISTENT] Unknown Mailbox: [Gmail] (Failure)')

Maybe escaping the imap select could help?
https://stackoverflow.com/questions/75395280/i-am-unable-to-retrieve-gmail-emails-from-any-other-labels-other-than-inbox-sent

Cannot install on Debian 9

Hi, when I try to install via the suggestet pip method or python setup.py I get this error message:

~/sources/imapfetch$ python setup.py
  File "setup.py", line 15
    github = f"https://github.com/ansemjo/{name}"
                                                ^
SyntaxError: invalid syntax

My python version

python -V
Python 3.5.3

Any suggestions how to fix this on Debian 9?

windows support

Testing the script on Windows currently shows two errors:

  • color escapes with \033[34;1m etc do not work
  • maildir tries to write mails with a colon : in the filename, which is not supported on windows

Detect windows with os.name or platform.system() and turn off / change color support and replace : with !?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.