sulami / feed2maildir Goto Github PK

View Code? Open in Web Editor NEW

14.0 3.0 7.0 77 KB

📬 Read RSS/Atom feeds in your favourite, maildir-compatible email client.

Home Page: https://pypi.python.org/pypi/feed2maildir/

License: ISC License

Python 100.00%

python feedreader maildir rss atom

feed2maildir's Introduction

Feed2Maildir

Read RSS/Atom feeds in your favourite, maildir-compatible email client.

Requirements

Python 3.2+
feedparser
python-dateutil

Usage

Just run feed2maildir, which should be placed in your $PATH by setup.py. You will need a JSON configuration file at $HOME/.f2mrc that looks like this:

{
    "db": "~/.f2mdb",
    "maildir": "~/mail/feeds",
    "feeds": {
        "Coding Horror": "http://feeds.feedburner.com/codinghorror/",
        "Commit Strip": "http://www.commitstrip.com/en/feed/",
        "XKCD": "http://xkcd.com/rss.xml",
        "What If?": "http://what-if.xkcd.com/feed.atom",
        "Dilbert": "http://feed.dilbert.com/dilbert/daily_strip?format=xml",
        "BSDNow": "http://feeds.feedburner.com/BsdNowOgg"
    }
}

Note that the last element in a dict must not be followed by a comma, because Python's json.loads() says so.

There are a bunch of command-line arguments to overwrite the config file:

optional arguments:
    -h, --help  show this help message and exit
    -c <file>   override the config file location (~/.f2mrc)
    -d <file>   override the database file location (~/.f2mdb)
    -m <dir>    override the maildir location (None)
    -s          strip HTML from the feeds
    -S <prog>   strip HTML from the feeds using an external program
    -l          just write the links without the update

To check for updates regularly, just toss it into cron to run once every hour or so.

Strip HTML

feed2maildir can strip the HTML tags from the feed using a built-in HTML stripper (option -s) or using an external program (option -S <prog>)

In this last case, the program must read the HTML from it standard input and return it stripped via the standard output.

The <prog> can be the name of a program or it can be a full shell command. In that case don't forget to quote the full command.

Here is an example of using pandoc to convert HTML to Markdown

feed2maildir -S 'pandoc --from html --to markdown_strict'

feed2maildir's People

Contributors

Stargazers

Watchers

Forkers

warbo eldipa eloisetwaine

feed2maildir's Issues

External program for html stripping

feed2maildir supports stripping html from the rss/atom but its abilities are limited. Currently handles images, tags and listing.

Instead of extending the built-in html stripper, feed2maildir could allow the user to run an external program like pandoc

The proposed usage would be:

$ feed2maildir -S <program>

Note: I would use -S and leave -s with the current behavior for backward compatibility.

The <program> can be anything that can read from the standard input an html and write to standard output the stripped version of it. <program> will be a single argument that will be interpreted by a shell command.

For pandoc this could be called like this:

$ feed2maildir -S 'pandoc --from html --to markdown_strict'

I would like to know your opinion about this. I can prepare a pull request with the implementation.

NameError: name 'stripper' is not defined

Hi,
the tool runs great without any options, but when I use feed2maildir -s, I get the following error message:

Traceback (most recent call last):
  File "/usr/bin/feed2maildir", line 49, in <module>
    main()
  File "/usr/bin/feed2maildir", line 46, in main
    converter.run()
  File "/usr/lib/python3.7/site-packages/feed2maildir/converter.py", line 99, in run
    self.write(self.compose(newfeed, newpost))
  File "/usr/lib/python3.7/site-packages/feed2maildir/converter.py", line 192, in compose
    desc = stripper.get_data()
NameError: name 'stripper' is not defined

I'm using the following versions:

feed2maildir           0.3.6             
feedparser             5.2.1             
python-dateutil        2.8.0             
Python 3.7.4

Any help would be appreciated.

Option parsing produces lists not strings

I tried running feed2maildir and hit the following problem:

$ feed2maildir -c "~/DELETEME/f2md/conf"
WARNING: could not open config "['~/DELETEME/f2md/conf']"
Traceback (most recent call last):
  File "/nix/store/c84bicz1xwh8xz1dmz5m0s57a6kfriij-python2.7-feed2maildir/bin/.feed2maildir-wrapped", line 50, in <module>
    main()
  File "/nix/store/c84bicz1xwh8xz1dmz5m0s57a6kfriij-python2.7-feed2maildir/bin/.feed2maildir-wrapped", line 45, in main
    links=args['l'])
  File "/nix/store/c84bicz1xwh8xz1dmz5m0s57a6kfriij-python2.7-feed2maildir/lib/python2.7/site-packages/feed2maildir/converter.py", line 76, in __init__
    self.maildir = os.path.expanduser(maildir)
  File "/nix/store/rnf1s3f60g7513svx51sixmcwplzbbf4-python-2.7.11/lib/python2.7/posixpath.py", line 254, in expanduser
    if not path.startswith('~'):
AttributeError: 'NoneType' object has no attribute 'startswith'

The "-wrapped" part is just an artefact of me using Nix. The actual problem seems to be the argument parsing. Notice that the error message shows the config file as ['~/DELETEME/f2md/conf'], which is a list when it should be a string.

It appears that the problem is in the feed2maildir script, which is passing the arguments through like:

Loader(config=args['c'])

When these arguments are actually single-element lists, rather than strings (due to the use of nargs=1).

I tried patching the script to use the first element of the list, i.e.:

Loader(config=args['c'][0])

This seems to work. I imagine similar issues would affect the other nargs=1 options -m and -d.

A different fix might be to change the way arguments are parsed, so they're strings to begin with, but I've not used argparse before.

Enhancement: fetch and parse the feeds in parallel

feed2maildir fetches and parses each feed in sequence (in reader.py) calling feedparser.parse for each feed.

In my personal setup this takes around 24 secs to fetch and parse 23 feeds.

Using a thread pool of 4 threads to do the fetch and parse in parallel the time was reduced to 8 secs. Users with several feeds will benefit from this enhancement.

Notes:

I couldn't find any warning in feedparser about thread safety. I'm assuming that it is thread safe.
I left for the future the possibility of configure the amount of threads. For now is hardcoded to 4.
feedparser seems to be IO bound so adding more threads could speed up the things even more.

I will make a PR for reference.

Reading into multiple maildirs

Hi!
I was wondering if it is possible to read a set of feeds into different maildirs? something like

{
    "db": "~/.f2mdb",
{
    "maildir": "~/mail/feeds1",
    "feeds": {
        "Commit Strip": "http://www.commitstrip.com/en/feed/",
        "XKCD": "http://xkcd.com/rss.xml",
    }
}
{
    "maildir": "~/mail/feeds2",
    "feeds": {
        "Dilbert": "http://feed.dilbert.com/dilbert/daily_strip?format=xml",
        "BSDNow": "http://feeds.feedburner.com/BsdNowOgg"
    }
}
}

My understanding is, that this behavior is not supported? So I was wondering if you think it might be a desirable addition? Or would it be difficult to add? If you think it is desirable but don't have time to work on it, I could try myself on a PR -- but wanted to check in first before I spend any time on it.

Thanks!

Workaround for feeds with outdated 'updated' times

Some feed sources like Youtube have the feed.updated out of date. It is a valid datetime but it is just old and feed2maildir thinks that there are no more new posts.

As a workaround I'm using the newest post's time as feed time.

Incorrent date and sender of messages

Hi,

feed2maildir is exactly what I was looking for. It works brilliantly except for two (minor but annoying problems)

mutt always shows "01 Jan 1970 01:00" as the date for the emails
the sender of messages is automatically composed into something that is difficult to read. It would be nice if this could instead be defined in the config line and/or whether the site name would be used instead