Giter Site home page Giter Site logo

caseycrowe / google-alerts-to-web Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 20 KB

This project pulls Google Alerts Atom feeds and publishes them to a web page.

License: MIT License

Python 61.55% PHP 38.45%
python3 php google-alerts feedparser pandas

google-alerts-to-web's Introduction

google-alerts-to-web

What it does: This project pulls Google's Atom feeds to a .CSV, scrubs them of extended characters, deduplicates the entries, and FTPs them to a web server. They are then displayed with PHP.

Why I built it: I wanted to keep abreast of news articles related to a very specific subject, and share that with a few other people that were interested. I began by trying to display Google Alert's raw XML page on my site, but it posed two problems. First, they prohibit cross-site loading. So you can view their page directly without issue, but can't load it from another site. Second, the articles only stay live on the alert for about a day. After that they are removed. I wanted them to persist, which meant saving them somewhere. A database seemed too heavy-handed, so I settled on .CSV since I'm not changing my query...yet.

How it works: There are a few different steps, and each step is its own file. I should note here that I run this all from a Raspberry Pi on my home network, and upload the feed-articles.csv to the webserver via the ftp-articles.py script with a daily cron job. Though if you had access to Python and the required modules and cron, you could do all this on your webserver/host and skip the FTP step.

  1. read-feeds.py pulls the feeds down to feed-articles.csv
  2. filter-feeds.py downloads a fresh copy of blacklist.txt from the website, loads feed-articles.csv, and removes any entries containing the blacklisted sites
  3. dedupe-feeds.py loads feed-articles.csv, and removes any articles that appear twice. Since I have numerous alerts that at times overlap, this prevents dupes.
  4. ftp-articles.py connects to an FTP server, deletes the backup copy of the feed-articles.csv (feed-articles.old), renames feed-articles.csv to feed-articles.old, and uploads a fresh copy of feed-articles.csv.
  5. Each of the above files logs the time, date, and some informational data to feedlog.txt for easy status checking.
  6. The index.php file displays the news articles in "cards" using Bootstrap 4. You can change which style you would like
  7. The news-admin file is new. It strips down the Google Alert URL all the way down to the subdomain and tld: ie: somedomain.com Clicking this button sends that subdomain and tld to blacklist-domain.php, which appends it to the blacklist.txt file. I strongly suggest you secure this process with a login or other mechanism This allows filter-feeds.py to get a fresh copy of the blacklist each time it runs, removing those domains from the feed-articles.csv file.

The process is as such:

news-admin.php -> "Blacklist domain" button -> blacklist-domain.php -> domain added to blacklist.txt -> filter-feeds.py downloads blacklist.txt and filters out the feed-articles.csv file

google-alerts-to-web's People

Contributors

caseycrowe avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.