Giter Site home page Giter Site logo

sulami / feed2maildir Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 7.0 77 KB

๐Ÿ“ฌ Read RSS/Atom feeds in your favourite, maildir-compatible email client.

Home Page: https://pypi.python.org/pypi/feed2maildir/

License: ISC License

Python 100.00%
python feedreader maildir rss atom

feed2maildir's Introduction

Feed2Maildir

Read RSS/Atom feeds in your favourite, maildir-compatible email client.

image

image

Requirements

  • Python 3.2+
  • feedparser
  • python-dateutil

Usage

Just run feed2maildir, which should be placed in your $PATH by setup.py. You will need a JSON configuration file at $HOME/.f2mrc that looks like this:

{
    "db": "~/.f2mdb",
    "maildir": "~/mail/feeds",
    "feeds": {
        "Coding Horror": "http://feeds.feedburner.com/codinghorror/",
        "Commit Strip": "http://www.commitstrip.com/en/feed/",
        "XKCD": "http://xkcd.com/rss.xml",
        "What If?": "http://what-if.xkcd.com/feed.atom",
        "Dilbert": "http://feed.dilbert.com/dilbert/daily_strip?format=xml",
        "BSDNow": "http://feeds.feedburner.com/BsdNowOgg"
    }
}

Note that the last element in a dict must not be followed by a comma, because Python's json.loads() says so.

There are a bunch of command-line arguments to overwrite the config file:

optional arguments:
    -h, --help  show this help message and exit
    -c <file>   override the config file location (~/.f2mrc)
    -d <file>   override the database file location (~/.f2mdb)
    -m <dir>    override the maildir location (None)
    -s          strip HTML from the feeds
    -S <prog>   strip HTML from the feeds using an external program
    -l          just write the links without the update

To check for updates regularly, just toss it into cron to run once every hour or so.

Strip HTML

feed2maildir can strip the HTML tags from the feed using a built-in HTML stripper (option -s) or using an external program (option -S <prog>)

In this last case, the program must read the HTML from it standard input and return it stripped via the standard output.

The <prog> can be the name of a program or it can be a full shell command. In that case don't forget to quote the full command.

Here is an example of using pandoc to convert HTML to Markdown

feed2maildir -S 'pandoc --from html --to markdown_strict'

feed2maildir's People

Contributors

eldipa avatar sulami avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

feed2maildir's Issues

External program for html stripping

feed2maildir supports stripping html from the rss/atom but its abilities are limited. Currently handles images, tags and listing.

Instead of extending the built-in html stripper, feed2maildir could allow the user to run an external program like pandoc

The proposed usage would be:

$ feed2maildir -S <program>

Note: I would use -S and leave -s with the current behavior for backward compatibility.

The <program> can be anything that can read from the standard input an html and write to standard output the stripped version of it. <program> will be a single argument that will be interpreted by a shell command.

For pandoc this could be called like this:

$ feed2maildir -S 'pandoc --from html --to markdown_strict'

I would like to know your opinion about this. I can prepare a pull request with the implementation.

NameError: name 'stripper' is not defined

Hi,
the tool runs great without any options, but when I use feed2maildir -s, I get the following error message:

Traceback (most recent call last):
  File "/usr/bin/feed2maildir", line 49, in <module>
    main()
  File "/usr/bin/feed2maildir", line 46, in main
    converter.run()
  File "/usr/lib/python3.7/site-packages/feed2maildir/converter.py", line 99, in run
    self.write(self.compose(newfeed, newpost))
  File "/usr/lib/python3.7/site-packages/feed2maildir/converter.py", line 192, in compose
    desc = stripper.get_data()
NameError: name 'stripper' is not defined

I'm using the following versions:

feed2maildir           0.3.6             
feedparser             5.2.1             
python-dateutil        2.8.0             
Python 3.7.4

Any help would be appreciated.

Option parsing produces lists not strings

I tried running feed2maildir and hit the following problem:

$ feed2maildir -c "~/DELETEME/f2md/conf"
WARNING: could not open config "['~/DELETEME/f2md/conf']"
Traceback (most recent call last):
  File "/nix/store/c84bicz1xwh8xz1dmz5m0s57a6kfriij-python2.7-feed2maildir/bin/.feed2maildir-wrapped", line 50, in <module>
    main()
  File "/nix/store/c84bicz1xwh8xz1dmz5m0s57a6kfriij-python2.7-feed2maildir/bin/.feed2maildir-wrapped", line 45, in main
    links=args['l'])
  File "/nix/store/c84bicz1xwh8xz1dmz5m0s57a6kfriij-python2.7-feed2maildir/lib/python2.7/site-packages/feed2maildir/converter.py", line 76, in __init__
    self.maildir = os.path.expanduser(maildir)
  File "/nix/store/rnf1s3f60g7513svx51sixmcwplzbbf4-python-2.7.11/lib/python2.7/posixpath.py", line 254, in expanduser
    if not path.startswith('~'):
AttributeError: 'NoneType' object has no attribute 'startswith'

The "-wrapped" part is just an artefact of me using Nix. The actual problem seems to be the argument parsing. Notice that the error message shows the config file as ['~/DELETEME/f2md/conf'], which is a list when it should be a string.

It appears that the problem is in the feed2maildir script, which is passing the arguments through like:

Loader(config=args['c'])

When these arguments are actually single-element lists, rather than strings (due to the use of nargs=1).

I tried patching the script to use the first element of the list, i.e.:

Loader(config=args['c'][0])

This seems to work. I imagine similar issues would affect the other nargs=1 options -m and -d.

A different fix might be to change the way arguments are parsed, so they're strings to begin with, but I've not used argparse before.

Enhancement: fetch and parse the feeds in parallel

feed2maildir fetches and parses each feed in sequence (in reader.py) calling feedparser.parse for each feed.

In my personal setup this takes around 24 secs to fetch and parse 23 feeds.

Using a thread pool of 4 threads to do the fetch and parse in parallel the time was reduced to 8 secs. Users with several feeds will benefit from this enhancement.

Notes:

  • I couldn't find any warning in feedparser about thread safety. I'm assuming that it is thread safe.
  • I left for the future the possibility of configure the amount of threads. For now is hardcoded to 4.
  • feedparser seems to be IO bound so adding more threads could speed up the things even more.

I will make a PR for reference.

Reading into multiple maildirs

Hi!
I was wondering if it is possible to read a set of feeds into different maildirs? something like

{
    "db": "~/.f2mdb",
{
    "maildir": "~/mail/feeds1",
    "feeds": {
        "Commit Strip": "http://www.commitstrip.com/en/feed/",
        "XKCD": "http://xkcd.com/rss.xml",
    }
}
{
    "maildir": "~/mail/feeds2",
    "feeds": {
        "Dilbert": "http://feed.dilbert.com/dilbert/daily_strip?format=xml",
        "BSDNow": "http://feeds.feedburner.com/BsdNowOgg"
    }
}
}

My understanding is, that this behavior is not supported? So I was wondering if you think it might be a desirable addition? Or would it be difficult to add? If you think it is desirable but don't have time to work on it, I could try myself on a PR -- but wanted to check in first before I spend any time on it.

Thanks!

Workaround for feeds with outdated 'updated' times

Some feed sources like Youtube have the feed.updated out of date. It is a valid datetime but it is just old and feed2maildir thinks that there are no more new posts.

As a workaround I'm using the newest post's time as feed time.

Incorrent date and sender of messages

Hi,

feed2maildir is exactly what I was looking for. It works brilliantly except for two (minor but annoying problems)

  1. mutt always shows "01 Jan 1970 01:00" as the date for the emails

  2. the sender of messages is automatically composed into something that is difficult to read. It would be nice if this could instead be defined in the config line and/or whether the site name would be used instead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.