Historical data of US stocks, updated daily.
The data are collected using pystock-crawler and pystock-github. Every day before the US stock exchanges open at 9:30 EST/EDT, the crawler collects the stock prices and financial reports, and pushes the data to this repository. Thus, the latest data is of yesterday. The data are day-based, meaning that you won't find hourly or minute-level data here.
pystock-crawler
crawls data from the following sources:
- NASDAQ.com for company ticker symbols
- Yahoo Finance for stock prices
- SEC EDGAR for financial reports
All data are stored in CSV and TXT files, archived with gzip. The files are
categorized and named by their created dates. For example, a file collected on
2015-03-23 is named 20150323.tar.gz
placed under 2015
directory.
Initial data are the first batch of collected data, whose date range spans from
2009-01-01 to 2015-03-20. They are split into three files: 0001_initial.tar.gz
to 0003_initial.tar.gz
, under 2015
directory.
Every gzip archive file may or may not contain the following files:
symbols.txt
prices.csv
reports.csv
symbols.txt
is a list of companies, line by line. For example:
AAPL Apple Inc.
FB Facebook
GOOGL Google Inc.
prices.csv
contains daily prices in CSV format. Normally, in a
daily-generated file, two trading days of prices are included. This is because
when a company splits its shares, you will need to compare the close price
(close
) and the adjusted close price (adj_close
) of the previous trading
day to detect the split. See Yahoo's explaination
for more details.
Here's a sample of prices.csv
:
symbol,date,open,high,low,close,volume,adj_close
AAPL,2015-03-23,127.12,127.85,126.52,127.21,36761000,127.21
AAPL,2015-03-20,128.25,128.40,125.16,125.90,67941100,125.90
reports.csv
contains several financial metrics extracted from 10-Q or 10-K
reports. 10-Q is a quarterly report. 10-K is an annual report. The CSV file
looks like:
symbol,end_date,amend,period_focus,fiscal_year,doc_type,revenues,op_income,net_income,eps_basic,eps_diluted,dividend,assets,cur_assets,cur_liab,cash,equity,cash_flow_op,cash_flow_inv,cash_flow_fin
GOOG,2009-06-30,False,Q2,2009,10-Q,5522897000.0,1873894000.0,1484545000.0,4.7,4.66,0.0,35158760000.0,23834853000.0,2000962000.0,11911351000.0,31594856000.0,3858684000.0,-635974000.0,46354000.0
GOOG,2009-09-30,False,Q3,2009,10-Q,5944851000.0,2073718000.0,1638975000.0,5.18,5.13,0.0,37702845000.0,26353544000.0,2321774000.0,12087115000.0,33721753000.0,6584667000.0,-3245963000.0,74851000.0
GOOG,2009-12-31,False,FY,2009,10-K,23650563000.0,8312186000.0,6520448000.0,20.62,20.41,0.0,40496778000.0,29166958000.0,2747467000.0,10197588000.0,36004224000.0,9316198000.0,-8019205000.0,233412000.0
The financial metrics included are:
end_date
: the end date of the period of the filing reportamend
: is the filing an amendment?period_focus
: Q1, Q2, Q3 for quarterly reports, or FY for annual reportsfiscal_year
: fiscal year of the companydoc_type
: 10-Q or 10-K
Income statement
revenues
: revenues or salesop_income
: operating incomenet_income
: net income or net earningseps_basic
: basic earnings per shareeps_diluted
: diluted earnings per sharedividend
Balance sheet
assets
: total assetscur_assets
: current assetscur_liab
: current liabilitiescash
: cash and cash equivalentsequity
: total equity
Cash flow
cash_flow_op
: cash from operating activitiescash_flow_inv
: cash from investing activitiescash_flow_fin
: cash from financing activities