Giter Site home page Giter Site logo

zoey.news's Introduction

ZOEY

Zoey is designed to identify, track and monitor which issues are appearing in online news articles covering the 2015 Canadian federal election. It does so by scraping news articles from specified sources and analyzing the frequency of words appearing in those sources.

This project is a work-in-progress and will be continually updated until the federal election on October 19, 2015.

SCRAPERS

Zoey's web scrapers are now built in Ruby. They are far slower than their JavaScript predecessors, but with much better consistency, and now incorporate error handling, including prevention of duplicate records being inserted.

As well, they are built to easily incorporate any additional Feedly feeds, with only minor modifications needed to add articles obtained from other sources.

SCRAPER FILES

Program logic is contained in scraper.rb.

The file 'scrapefeedly.rb' contains the feeds to be scraped. The items_to_scrape variable sets how many articles to scrape from each feed, beginning with the most recent article. This will ideally be changed to specify a date range in the future. Articles already in the database are ignored.

SCRAPER USAGE

  1. Do a bundle install from the Zoey root directory.

  2. Create a file named 'dbconfig.rb' in the scrapers directory, and add the following line to this file:

    DATABASE = '<your_database_name'>

  3. Perform a knex migrate:rollback as many times as necessary to reset the dataabse, and then a knex migrate:latest. Then run 'knex seed:run' to populate the charts table. This step is not necessary before performing subsequent scrapes.

  4. In scrapefeedly.rb, change the query parameters as desired. Parameters have now been set to scrape the most recent 10,000 articles per feed, starting from 00:00:01 EDT on Sunday, August 15, 2015.

  5. Execute the following command:

    ruby scrapefeedly.rb

  6. Sit back and watch those articles roll in.

zoey.news's People

Contributors

mattfoulger avatar jzhang729 avatar majorj avatar bartekus avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.