Giter Site home page Giter Site logo

vas3k / infomate.club Goto Github PK

View Code? Open in Web Editor NEW
437.0 15.0 85.0 1.07 MB

RSS feed aggregator with collections and NLP article summarization

Home Page: https://infomate.club

License: Apache License 2.0

Python 49.46% CSS 13.23% JavaScript 3.86% HTML 31.50% Dockerfile 0.50% Makefile 1.46%
rss feed python telegram nltk nlp

infomate.club's Introduction

Infomate.club

Build Status

Infomate is a small web service that shows multiple RSS sources on one page and performs tricky parsing and summarizing articles using TextRank algorithm.

It helps to keep track of news from different areas without subscribing to hundreds of media accounts and getting annoying notifications.

Thematic and people-based collections does a really good job for discovery of new sources of information. Since we all are biased, such compilations can really help us to get out of information bubbles.

Live URL: infomate.club

🐶 This is a pet-project

Which means you really shouldn't expect much from it. I wrote the MVP over the weekend to solve my own pain. No state-of-art kubernetes bullshit, no architecture patterns, even no tests at all. It's here just to show people what a pet-project might look like.

This code has been written for fun, not for business. There is usually a big difference.

🤔 How it works

It's basically a Django web app with a bunch of scripts for RSS parsing. It stores the parsed data in a PostgreSQL database.

The web app is only used to show the data (with heavy caching). Parsing and feed updates are performed by the three scripts running in cron. Like poor people do.

Feedparser and BeautifulSoup are used to find, download and parse RSS.

Text summarization is done via newspaper3k with some additional protection against bad types of content like podcasts and too big pages in general, which can eat all your memory. Anything can happen in the RSS world :)

▶️ Running it locally

The easy way. Install docker on your machine. Then:

git clone [email protected]:vas3k/infomate.club.git
cd infomate.club
docker-compose up --build

On the first run you might need to wait until the "migrate_and_init" container will finish its job populating your database. After that you can open localhost:8000 in your favorite browser and enjoy.

If something stucked or you want to terminate it completely, use this command in another terminal:

docker-compose down --remove-orphans

⚙️ boards.yml format

All collections and feeds are stored in one file — boards.yml. This is your main and only entry point to add new stuff.

boards:
- name: Tech            # board title
  slug: tech            # board url
  is_visible: true      # visibility on the main page
  is_private: false     # private boards require logging in
  curator:              # board author profile
    name: John Wick 
    title: Main news
    avatar: https://i.vas3k.ru/fhr.png 
    bio: Major technology media in English and Russian
    footer: >
      this is a general selection of popular technology media.
      The page is updated once per hour.
  blocks:               # list of logical feed blocks
  - name: English       # block title
    slug: en            # unique board id
    feeds:         
      - name: Hacker News
        url: https://news.ycombinator.com
        rss: https://news.ycombinator.com/rss
      - name: dev.to
        url: https://dev.to
        rss: https://dev.to/feed
      - name: TechCrunch
        rss: http://feeds.feedburner.com/TechCrunch/
        url: https://techcrunch.com
        is_parsable: false  # do not try to parse pages, show RSS content only
        conditions:
          - type: not_in
            field: title
            word: Trump   # exclude articles with a word "Trump" in title

💎 Running in production

Deployment is done using a simple Github Action which builds a docker container, puts it into Github Registry, logs into your server via SSH and pulls it. The pipeline is triggered on every push to master branch. If you want to set up your own fork, please add these constants to your repo SECRETS:

APP_HOST — e.g. "https://your.host.com"
GHCR_TOKEN — your personal guthib access token with permissions to read/write into Github Registry
SECRET_KEY — random string for django stuff (not really used)
SENTRY_DSN — if you want to use Sentry
PRODUCTION_SSH_HOST — hostname or IP of your server
PRODUCTION_SSH_USERNAME — user which can deploy to your server
PRODUCTION_SSH_KEY — private key for this user

After you install them all and commit something to the master, the action should run and deploy it to your server on port 8816.

Don't forget to set up nginx as a proxy for that app (add SSL and everything else in there). Here's example config for that: etc/nginx/infomate.club.conf

If something doesn't work, check the action itself: .github/workflows/deploy.yml

🎉 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

You can help us with opened issues too. There's always something to work on.

We don't have any strict rules on formatting, just explain your motivation and the changes you've made to the PR description so that others understand what's going on.

👩‍💼 License

Apache 2.0 © Vasily Zubarev

TL;DR: you can modify, distribute and use it commercially, but you MUST reference the original author or give a link to service

infomate.club's People

Contributors

alvicsam avatar dependabot[bot] avatar dotterian avatar eisenest avatar jtraub avatar leitsius avatar maxlipsky avatar nermolaev avatar pinchukdiana avatar pshergie avatar romutchio avatar rulikkk avatar sasha-mikhailov avatar savinovd avatar sneksik avatar tiulpin avatar vas3k avatar vitalii-honchar avatar vovinacci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

infomate.club's Issues

"Unused" boards removed

Why did you do such a thing?

Even if those boards were no longer actively maintained, who's to say they couldn't nor wouldn't be useful as a reference list?

cant scroll

Cant scroll at the right part of the window (to the right of the headings) when it open at fullscreen

Need an endpoint for Telegram -> RSS parsing

I used three different services to get RSS feeds from Telegram channels. One of them got me banned, the second one returns an error 500 from time to time, the third was Chinese and it inserts hieroglyphs into the feeds :D

I think it's time to use some github library and write our own parser. It would be nice to have an endpoint that takes the Telegram channel name and returns its RSS. Don't forget about pics and videos.

If anyone want to help me with that — feel free to comment below and open a PR.

No data loaded, all categories are empty (errors in logs)

No data loaded, all categories are empty

Steps to reproduce (default steps from readme):

  1. docker-compose up --build
  2. It starts loading data and logs seems to be OK
  3. open https://localhost:8000
  4. See empty categories (like "Latest article is None")
  5. Console reports errors while using interface
  6. Exception happened during processing of request from ('X.X.X.X', XXXXX)

Tested Environment:

  • Debian 11 Bullseye
  • Docker version 20.10.3
  • docker-compose version 1.25.5

Too long title and not centered

As I suppose, there should be some length limit for title for a correct display. Or may be topic title should be "Кибер безопасность".

image

Theme switcher bug related to mac default color scheme

There is still a visual bug with a theme switcher:

First case:

  • MacOS user has a dark theme
  • It's the first load ('theme' param in localStorage does not exist)
    Result:
    The site loaded with a correct color theme but switcher itself in a wrong position ( screenshot attached )

Screenshot 2020-01-13 at 16 58 09

Second case:

  • MacOS user is changing his OS theme with site opened in browser
    Result:
    The site's color theme doesn't follow the OS theme changing.

PR is coming =)

Обновление страницы

Предлагаю добавить в майн.жс что-то типа

document.addEventListener("visibilitychange", function() {
  if (document.visibilityState === 'visible') {
    UPDATEBOARD();
});

ибо переключаешься на вкладку через пару дней, а там старье, но доска нагло врет, что последний пост был 5 минут назад, хотя там уже 2 дня как мхом поросло.

Hardcoded ALLOWED_HOSTS

Когда пытаешься запустить у себя контейнер на сервере, и попытаться достучаться до него с другого ПК, падает с ошибкой, что в ALLOWED_HOSTS такого адреса нет. Для себя это решается хардкодом своего адреса, но было бы круто с этим как-то разобраться, чтобы желающие могли развернуть у себя без проблем.

Add CI support

Hi,

To be able to run linters against PRs to ensure, shall we add external CI integration? For example, https://travis-ci.org/ is free for open-source projects.
Target goal is to be able to run linter and mypy to ensure that PR quality didn't drop.
Later this could extended with tests and other fancy things if necessary.

What do you think? I can create PR to have this in place.

vc и dtf удалили rss-ленту

https://vc.ru/rss и https://dtf.ru/rss теперь отдают 404. Писал им в поддержку, получил следующий ответ:

Здравствуйте!
Сейчас мы не поддерживаем полнотекстовые RSS.

Так что возможно стоит удалить их из подборки (либо поискать альтернативные способы получения фида)

Improve text summarization by loading language-specific stopwords

Thanks for an amazing project!

Seems like newspaper3k does not detect the language of the paper by default to fetch appropriate stop-words
(see https://github.com/codelucas/newspaper/blob/f622011177f6c2e95e48d6076561e21c016f08c3/newspaper/article.py#L372)
Since summarization algo is extremely sensitive to extracted keywords, the quality of the summary can be improved by loading list of keywords manually. It can be accomplished by simply putting

newspaper.nlp.load_stopwords("ru")

in scripts/update.py

For example
Before:

В Лондоне выставят на аукцион первый известный документ о первой российской кампании по вакцинации — письмо Екатерины II о необходимости при
вивок от черной оспы.
Екатерина II первой в России привилась от оспы.
Это произошло в октябре 1768 года — в разгар эпидемии болезни в России и Европе.
Вот два графика, которые это доказывают В Западной Европе (как и в России) растет заболеваемость ковидом.
Вот два графика, которые это доказывают В Западной Европе (как и в России) растет заболеваемость ковидом.

After:

В Лондоне выставят на аукцион первый известный документ о первой российской кампании по вакцинации — письмо Екатерины II о необходимости прививок от черной оспы.
Речь идет о письме императрицы к генералу-фельдмаршалу графу Петру Румянцеву, которое было написано во время путешествия Екатерины II в Крым.
В тексте она объясняет, как организовать вакцинацию от черной оспы на государственному уровне, поскольку без нее жителям страны грозит «великий вред особливо в простом народе».
Письмо Екатерины II выставят на торги общим лотом с портретом императрицы работы художника Дмитрия Левицкого 1 декабря.
Последняя вспышка натуральной или черной оспы была зафиксирована и ликвидирована в Сомали в 1977 году.

Ошибка при создании баз данных

Возможно, очень простой вопрос, но если делать в точности как написано в README, то при выполнении команды python3 manage.py migrate появляется нижеследующая ошибка. Никогда не работала с postgres и Django, поэтому она мне непонятна. Проверено на двух разных установках Ubuntu 20.04 и 18.04.

`Traceback (most recent call last):
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
self.connection = self.get_new_connection(conn_params)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "/home/tatyana/.local/lib/python3.6/site-packages/psycopg2/init.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not translate host name "postgres" to address: Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "manage.py", line 21, in
main()
File "manage.py", line 17, in main
execute_from_command_line(sys.argv)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/core/management/init.py", line 381, in execute_from_command_line
utility.execute()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/core/management/init.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/core/management/base.py", line 83, in wrapped
res = handle_func(*args, **kwargs)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 87, in handle
executor = MigrationExecutor(connection, self.migration_progress_callback)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/migrations/executor.py", line 18, in init
self.loader = MigrationLoader(self.connection)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/migrations/loader.py", line 49, in init
self.build_graph()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/migrations/loader.py", line 212, in build_graph
self.applied_migrations = recorder.applied_migrations()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/migrations/recorder.py", line 73, in applied_migrations
if self.has_table():
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/migrations/recorder.py", line 56, in has_table
return self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor())
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 256, in cursor
return self._cursor()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 233, in _cursor
self.ensure_connection()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
self.connect()
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/base/base.py", line 195, in connect
self.connection = self.get_new_connection(conn_params)
File "/home/tatyana/.local/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
connection = Database.connect(**conn_params)
File "/home/tatyana/.local/lib/python3.6/site-packages/psycopg2/init.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not translate host name "postgres" to address: Name or service not known
`

Can't connect to local host?

Hello, great project. I was able to run the containers however I'm having an issue with safari or chrome connecting to the local host. I'm running a local version not a production version. Has anyone experience this similar issue?

Screenshot 2024-03-10 at 4 12 48 PM Screenshot 2024-03-10 at 4 18 24 PM

I've added the port 5432 as this was a recommendation in the previous issue that had comments.

I'm using Docker, VS Code, and running on Mac.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.