python3 -m pip install pandas selenium
Download the necessary browser driver (Firefox preferred) from here and update the filepath to this driver.
DRIVER_PATH = "./geckodriver" # replace with path to driver for your OS
The datasets used to collect necessary information are included in the repo under
the datasets
directory. To include a newer version of a specific dataset it
should be included in the datasets
directory and then update the corresponding
variable at the top of utility-scraper.py
with the appropriate filename.
# Dataset files
DATASETS_DIRECTORY_PATH = "datasets"
US_ENERGY_PRODUCTION_BY_STATE_FILE = "annual_generation_state.xls"
US_POPULATION_BY_CITY_FILE = "sub-est2021_all.csv"
US_WATER_PROVIDERS_BY_STATE_FILE = "Water System Detail.csv"
https://www2.census.gov/programs-surveys/popest/datasets/2020-2021/cities/totals/
https://www.eia.gov/electricity/data/state/
https://ordspub.epa.gov/ords/sfdw/sfdw/r/sdwis_fed_reports_public/1
Under 'Select a Report' in the Report Options section, select 'Water System Detail'. Select the appropriate Submission year and quarter (e.g. 2022 Quarter 2) and then select View Reports.
Site scraped: findenergy.com
This site aggregates information found from several government agency reports (mostly EIA) found here
Data for all 50 states can be collected by running the utility-scraper.py
file.
python3 utility-scraper.py