Giter Site home page Giter Site logo

jkothari18 / benzinga_movers_scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from swang2016/benzinga_movers_scraper

0.0 0.0 0.0 96.21 MB

Web Scraper for Pulling Stock Movement Headlines From Benzinga

Python 1.16% Jupyter Notebook 98.84%

benzinga_movers_scraper's Introduction

Overview

This project scrapes descriptions of stock price movements from various Benzinga's Movers article series (https://www.benzinga.com/movers). The scraper functions pull descriptions from the following time periods and article series:

In addition to scraping headlines, additional functions for obtaining historical price and volume data using the IEX API are included. Functions for scraping earnings filing dates from Marketwatch are also available.

Lastly, pre-scraped data recent as of 05/03/2019 is available in the Data folder.

Analysis

A sampling of some exploratory analysis done on this data can be found in benzinga_exploratory_analysis.ipynb, in the Scripts folder. A Medium post linked here, https://medium.com/@steven_wang/exploring-stock-price-movements-after-major-events-8b35c318ba76, is based on this analysis.

Usage Examples

Examples of how to use the functions mentioned above are also shown in the Jupyter notebook, benzinga_scrape.ipynb, in the Scripts folder.

Scraping Descriptions

Initial data pull (only goes back to 10/12/2015, can choose to specify more recent dates):

df = get_initial_biggest_movers_data('October 12, 2015')

If you already have descriptions saved and only want to update the data with more recent descriptions:

new_dat = get_new_benzinga_data(old_dat)

Where old_dat is a dataframe of descriptions already saved. new_dat will be new descriptions retrieved from dates after the most recent description in old_dat.

Getting Price and Volume Data

Initial data pull (specify how far back you want to go, IEX API only goes back 5 years):

prices, volumes = get_initial_price_data(tcks, '2015-10-01')

where tcks is a list of tickers.

If you already have price and volume data saved and only want to update with more recent data:

new_prices, new_volumes = get_new_price_data(tcks, prices)

where tcks is a list of tickers and prices is a dataframe of already saved prices. new_prices and new_volumes will have the latest prices and volumes updated for tickers already saved as well as entire price and volume history going back 5 years for tickers not already saved.

Getting Earnings Reports Dates

Initial data pull (specify how far back you want to go):

earnings = get_earnings_dates(tcks, '2015-10-12')

where tcks is a list of tickers.

If you already have dates saved and only want to update with more recent data (once again, specify how far back you want to go):

new_earnings = get_new_earnings_dates(tcks, earnings, '2015-10-12')

where tcks is a list of tickers and earnings is a dataframe of earnings dates already saved. new_earnings will have the latest earnings dates updated for tickers already saved as well as the entire earnings date history going back to the specified date for tickers not already saved.

Note: Marketwatch does not have filing dates for some stocks that are no longer traded/companies that were acquired/etc.

Scraped Data

The Scripts folder contains a Jupyter notebook, benzinga_clean.ipynb, that shows how the headline, price, volume, and earnings dates data might be cleaned and combined.

In the Data folder you can also find pre-scraped data from 10/12/2015 - 05/03/2019 using the functions mentioned above. The pre-scraped data includes:

  • stock_headlines.csv: CSV of descriptions, associated tickers, and dates
  • stock_prices.csv: CSV of tickers and their prices for the date range of 10/12/2015 - 05/03/2019
  • stock_volumes.csv: CSV of tickers and their volumes for the date range of 10/12/2015 - 05/03/2019
  • filing_dates.csv: CSV of tickers and their earnings filing dates for the date range of 10/12/2015 - 05/03/2019
  • stock_headlines_cleaned.csv: stock_headlines.csv with additional features generated from stock_prices.csv and stock_volumes.csv. Example code showing how this file was made is in benzinga_clean.ipynb in the Scripts folder.

benzinga_movers_scraper's People

Contributors

swang2016 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.