mathisdt / nextcloud-news-filter Goto Github PK

2.0 3.0 1.0 18 KB

automatically mark some items as read in Nextcloud News

License: GNU General Public License v3.0

Python 100.00%

filter news nextcloud

nextcloud-news-filter's Introduction

Automatically mark some items as read in Nextcloud News

Some news feeds are very interesting, but sometimes also contain advertisements or simply uninteresting articles. Of course you can ignore them, but with this small script you also can set up rules which mark these items as read so you don't see them at all.

Get started

You can clone this repository or simply download main.py and config-example.ini. To run the script you'll need Python 3 installed.

Configuration

All configuration is read from the file config.ini which is expected next to the script. You can copy the file named config-example.ini to config.ini to get started.

This makes use of the Nextcloud News API which requires authentication, so you need to supply your username and password - and of course the address of your Nextcloud installation. Put these into the section [login].

Filters

Each filter has a name (or title) you can define as you like. It's enclosed in brackets [...].

You can add one or more of the following criteria. Beware: If you don't add any attribute, then your filter will match all items!

Attention: All regular expressions are interpreted as standard strings, not as raw strings, so you have to apply escaping (see the first paragraphs of the documentation). Example: the regex \bABC\b as the titleRegex needs to be written as titleRegex = \\bABC\\b.

feedId: Apply the filter only on one specific feed. You can find out the number you should enter here by hovering your mouse over the feed name in the sidebar in Nextcloud News. The URL shown at the bottom of you screen ends in the right number, e.g. .../items/feeds/32/ - here, the feed ID is 32.
titleRegex: Check if some part of the item title matches this regular expression (case-insensitively).
bodyRegex: Check if some part of the item body matches this regular expression (case-insensitively).
hoursAge: Match items older than this (pubDate is checked, not updatedDate or lastModified).

If you define them, these criteria all have to match (in one filter) for the checked item to be marked as read, they are and-joined. So the less criteria you specify (per filter), the broader the matched portion of feed items should be.

If you need help with regular expressions, you can e.g. look here.

When and how to run

This script can be run in conjunction with Nextcloud's normal cron hook, e.g. like this:

*/5 * * * * /usr/bin/php -f /path/to/nextcloud/cron.php ; ( /path/to/nextcloud-news-filter/main.py | grep -E '(filter|marking as read)' 2>&1 >>/path/to/nextcloud-news-filter.log )

The part up to the semicolon was there before, only the latter part was appended when installing this script.

Note: You should make sure that the user executing this (probably www-data or something similar) has the rights (1.) to execute the script and (2.) to write to the indicated log file.

Communication

If you find a bug or want a new feature or just have a question, you are welcome to file an issue or even fix things yourself and create a pull request. You can also write me an email and I'll see what I can do.

License

This work is licensed under the GNU General Public License (GPL), Version 3.

nextcloud-news-filter's People

Contributors

Stargazers

Watchers

Forkers

rhetticent

nextcloud-news-filter's Issues

TypeError: expected string or bytes-like object, got 'NoneType'

Summary

I (seemingly) randomly started seeing failures when the script is run if a feedId is not specified.

config.ini setup

(condensed to one filter - this error occurs on all filters, regardless whether an article is found that matches the filter or not)

[login]
address = https://xxx.xxx.com
username = xxx
password = xxx

# Optional parameters:
#
# feedId: restrict rule to just this feed.
# ex: feedId = 51
#
# titleRegex: apply rule to article title
# ex: titleRegex = \bCOLOR\b (use \b to trap word boundaries)
#
# bodyRegex: apply rule to article body
# ex: bodyRegex = \bCOLOR\b (use \b to trap word boundaries)
#
# hoursAge: apply rule if "pubDate" is older than this
# ex: hoursAge = 72

[ALL-Sonos]
titleRegex = \bsonos\b

Results

2024-02-22 14:04:30,428 - root - DEBUG - starting run
2024-02-22 14:04:30,430 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): xxx.xxx.com:443
2024-02-22 14:04:30,568 - urllib3.connectionpool - DEBUG - xxx.xxx.com:443 "GET /index.php/apps/news/api/v1-3/items HTTP/1.1" 200 77326
Traceback (most recent call last):
  File "/home/xxx/nextcloud-news-filter/main.py", line 72, in <module>
    or one_filter['titleRegex'].search(item['title'])) \
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'NoneType'

Nothing happens : help in config

Hi,

I followed the instruction to enable the filtering.

I run nextcloudpi in a lxc container in my mini pc running proxmox.

Using this guide (https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/background_jobs_configuration.html)

I added the second row in my cron file


*/5  *  *  *  * php -f /var/www/nextcloud/cron.php 
*/5  *  *  *  * /usr/bin/php -f /var/www/nextcloud/cron.php ; ( /home/nextcloud-news-filter/main.py | grep -E '(filter|marking as read)' >>/home/nextcloud-news-filter/nextcloud-news-filter.log )

(config.ini and main.py are in my home folder

)

I waited almost 24 hours but I checked and nothing happens.

This is my config.ini

[login]
address = https://cloud.xxxxxx.it

username = xxxx
password = xxxx

# The names of the following sections are not specified, use as description of the filter.
# optional parameter feedId -> apply only in this feed
# optional parameter titleRegex -> apply to item titles
# optional parameter bodyRegex -> apply to item bodies
# optional parameter hoursAge -> apply if item's "pubDate" is older than this

[example 1: filter in feed BLAH]
feedId = 23
titleRegex = unwanted.*juventus

[example 2: filter ads in all feeds]
bodyRegex = (advertisement|paid content)

[example 3: older than 72 hours]
hoursAge = 72

Does it looks right?

An important info: my username is protected by 2 factor auth (so I put a numeric code after login). Maybe is this the reason why it does not work?

Unfortunately the log file is created but is empty.