Giter Site home page Giter Site logo

social-data's Introduction

Arup Social Data

Streamlit App

This is a repository for collection and analysis of open social data. This type of data can be useful to planners, NGO's, governments, firms, and anyone trying to address social problems through data.

The bulk of the analysis is related to evictions as a result of COVID-19. This repository gathers data from the Federal Reserve Economic Data portal (FRED). There are a number of potential use cases we're aware of to apply that data to address evictions. This analysis allows us to compare counties side by side by their "Relative Risk" of evictions.

No code, no problem

Not a developer? Just don't want to code today? No problem! You can access the data using our web app at the link below.

https://share.streamlit.io/arup-group/social-data/run.py

Python Usage

We've done our previous analyses in Python, and have built data gathering, cleaning, and analysis functions.

The credentials.py file is configured to allow read-only access to an Arup-maintained database of the most recent relevant FRED data. In addition to using these Python scripts, you can connect to this database and run direct SQL queries to get the data you want.

Install

This project relies on GDAL for geospatial things, which means you'll need to install GDAL on your machine (sorry, Windows people).

Mac users (using homebrew):

brew install gdal

Windows users (based on this):

  1. Get and install appropriate binary for your machine
  2. Add PATH and environment variables like in this article
  3. Install Visual C++ Build Tools if you haven't already
  4. Install Fiona from the appropriate whl file here
  5. In your virtual environment: pip install <path/to/fiona.whl>

In a virtual environment:

pip install -r requirements.txt

Run

We suggest running using Streamlit for most use cases:

streamlit run run.py

To run as a typical Python script, run:

python run.py --mode script

Docker

You can also install and run the application locally using Docker:

docker build . -t streamlit-social-data

docker-compose up -d

You can access the app on http://localhost:8501.

Database Usage

The PostgreSQL database that this repository uses is open for read-only access. The connection details are stored in credentials.py if you're using the Python workflow.

If you'd like to query the database directly using the method of your choice, you can access it using the credentials below:

DB_HOST = ade-eviction.ccgup3bgyakw.us-east-1.rds.amazonaws.com
DB_NAME = eviction_data
DB_PORT = 5432
DB_USER = readuser
DB_PASSWORD = password

Workflows

This repository currently two primary workflows:

  1. Viewing and examining the data in the database and downloading raw data for your own analysis
  2. Gathering data for counties in the US and comparing their Relative Risk of eviction

Currently we support census tract level data for the Data Exploration workflow

These workflows support a number of use cases, including:

  • Providing direct assistance to families in one or more counties
  • Driving decisions around future affordable housing projects
  • Directing policy response on the city or county level

There are three ways to interact with this data.

  • Web interface - You can use the interactive UI either on the web or run locally
  • Python scripts - Include functions to gather, clean, and analyze data to output a relative risk ranking for multiple counties. Can be extended with your own scripts.
  • Custom SQL Queries - SQL that you write to get the most recent data from our database and use however you want.

About the data

We currently have almost 40 tables in the database, representing over 2 million rows of data.

This data is the most recent data we could get, but some datasets are updated more frequently than others. You can see the date that the data was updated in all_tables.xlsx. We refresh the data monthly on the first of the month. Some datasets are at the county level and some are at the census tract level. The following datasets are currently in the database:

Feature Name Resolution Source Updated Notes
ID Index All -- 2021 Arup-developed index of State, County, and Census tract IDs to query between tables
Burdened Households (%) County FRED 1/1/2018 People who pay more than 30 percent of their income towards rent
Home Ownership (%) County FRED 1/1/2018 This field has been superseded by the renter occupied housing units value from the demographic data.
Income Inequality (Ratio) County FRED 1/1/2018
Population Below Poverty Line (%) County FRED 1/1/2018
Single Parent Households (%) County FRED 1/1/2018
SNAP Benefits Recipients (Persons) County FRED 1/1/2017 This field is no longer used in our analysis, but exists in the database.
Unemployment Rate (%) County FRED 6/1/2020
Resident Population (Thousands of Persons) County FRED 1/1/2019 Used to convert percentages to raw population
Resident Population Census Tract ACS 2019
COVID Vulnerability Index County CHMURA 4/15/2020 An index from CHMURA to represent how vulnerable counties across the US are to COVID-related economic effects.
Fair Market Rents County HUD 10/1/2020 Represents the estimated amount (base rent + essential utilities) that a property in a given area typically rents for
Median Rents County HUD 2021 Rent estimates at the 50th percentile (or median) calculated for all Fair Market Rent areas
Housing Stock Distributions County US Census 2017 Distribution of housing units in the US by number of bedrooms. Defaults to the national distribution, but includes data for the top 15 metro areas in the US. Includes percentage and estimated housing units.
County Geometries County US Census 2017 PostGIS compatible geometry data
Socio-Demographics County ArcGIS 2017 A collection of fields including each race & ethnicity break down, the count of males and females, median age, the number of housing units, vacant units, renter occupied units, and more.
Census Tract Geometries Census Tracts -- 2010
English Proficiency Census Tracts ACS 2019
Group Quarters Population Census Tracts ACS 2019
Housing Units in Structure Census Tracts ACS 2019
Occupants per Bedroom Census Tracts ACS 2019
Median Household Income Census Tracts ACS 2019
Employment Status Census Tracts ACS 2019
Per Capita Income Census Tracts ACS 2019
Disability Status Census Tracts ACS 2019
Poverty Status Census Tracts ACS 2019
Family Type Census Tracts ACS 2019
Educational Attainment Census Tracts ACS 2019
Hispanic or Latino Origin by Race Census Tracts ACS 2019
Sex by Age Census Tracts ACS 2019
Sex of Workers by Vehicles Available Census Tracts ACS 2019
Household Job Availability Census Tracts ACS 2013
Household Technology Availability Census Tracts ACS 2013
Household Vehicle Availability Census Tracts ACS 2013
National Walkability Index Census Tracts EPA 2012
Trip Miles Census Tracts BTS 2013
Level of Urbanicity Census Tracts EPA 2013

If you have datasets you'd like to see included, please create an Issue.

What is Relative Risk?

Relative risk is an index used to compare counties against each other in our Python analysis. It is not a measure of absolute or percentage risk and has no meaning outside of this analysis. We use this to compare counties to understand where people are more at risk of eviction.

We calculate this using a custom formula to balance socioeconomic factors with the policy response of a city. There are three parts of this formula: a socioeconomic index value calculated from the combination of normalized FRED data and other data sources in the table above, a policy index value found using the policy Excel sheet in this repository and using methodology adapted from EvictionLab, and the time remaining until protections end. The equation is

relative_risk = socioeconomic_index * (1 - policy_index) / sqrt(time_remaining)

In this equation, we aim to minimize socioeconomic risk on the top of the fraction, maximize the policy response value and take 1 minus the index value calculated (to represent that better policy constitutes less relative risk), and maximize the time remaining on the bottom of the fraction. The square root of the time remaining is used to represent how additional time has diminishing returns. For example, the different between having 2 days and 2 weeks until protections expire is much different than whether policies expire in 3 months or 4 months when we think about how to prioritize one county over another.

What is cost of evictions?

To calculate the cost to avoid evictions the following calcuation was used:

(burdened_household_proportion / 100) * county_rent * housing_stock_distribution percentage * population * (burdened_households_proportion / 100)

Housing stock distributions (US Census) were used to evaluate the types of houses that exist in the United States (e.g. studio, one bedroom, etc.). Burdened household proportions were decided by Arup and were used to determine what percentage of people were most at risk. Proportions were set at 5, 25, 33, 50, and 75 to represent that not all burdened households will face eviction, and we don't know how many actually will. With the chosen proportion values, we are representing a general range, knowing there isn't a case where nobody will be evicted, or a case where everybody will be evicted. These numbers can be adjusted based on the user's knowledge of their own county. County rents for each housing stock were sourced from HUD, and used Fair Market Rates to determine rent prices for each stock. Fair Market Rents (FMRs) represent the estimated amount (base rent + essential utilities) that a property in a given area typically rents for. Burdened households was a percentage calculated by FRED which gives the percentage of the population who pay more than 30% of their income on rent. Population data was pulled from HUD and represents the population by county in the year 2017.

Performing this calculation for each proportion and each housing stock will calculate the amount needed to prevent eviction for each proportion. The total cost of evictions for a particular county for one month was calcuated by summing each housing stock. To get the cost of prevention for more than one month, the sum was multiplied by the number of months of interest.

This script uses Fair Market Rent values. Median rent values are also available in the database. The script can be manually adjusted by the user to reference median rent values rather than fair market rent if they choose.

Upon script completion, an excel file will be created within the output folder displaying all values mentioned above. If you experience problems with the script or have questions about methodologies, please reach out to a member of the development team.

FRED Queries

You can get the most recent Federal Reserve Economic Data (FRED) using the following commands:

python -m queries

Flags

Flag Purpose
--table Specifies a table that you want to query. If none is given, will return all tables.
--ouput Specifies an output type, without a ".". Currently, only pk or xlsx are supported. pk returns a pickled pandas.DataFrame. If none is given, will default to xlsx.

Purpose

This command will query the eviction data database, and return data in an Output folder. If no folder exists, will be created by the queries.py script.

Policy Workbook

Included in this repository is a template Excel file for policy data. This file is referenced in the Python scripts. There are three pages to be aware of.

Policy Timeline

This page is used to keep track of time-dependent policies and generate a "countdown clock" for a county. You can add policies and their expirations to end up with a date where people lose protections. You can also color the cells on this page to show where different policies overlap and how their expirations line up visually.

Policy Ranking

This page is used to collect and represent the specific policy nuanced not captured by the timeline. For each county, enter a 1 in each cell where a policy applies. These values are weighted according the methodology used by EvictionLab (with a couple minor modifications) to get an index score for each county.

Analysis Data

This page collects the results from the previous two pages in a format that can be more easily read by the Python scripts. You may need to copy the countdown and policy index values for each county you're analyzing.

Contributing

We would love for you to use this code, build on it, and share with others. See our contribution guide and code of conduct for more information.

Issues

Please post bugs, errors, and questions to the Issues tab of this repository.

Code

Make a pull request with your feature or bug fixes and we will review it. You're free to merge once it's been approved.

Policy Data

Policy data is some of the most useful, but also the hardest to get. We've shared policies for a number of counties we've analyzed in the Bay Area and Tulsa area. We hope that as you do policy analysis for your area, you'll contribute that back so others can use and verify it.

For now, to submit your data, open an issue in this repository and submit your policy data (countdown and/or policy ranking) for the county or counties you're looking at.

Other Data Sources

If there are data sources you'd like to see included in this, please reach out through the Issues tab.

License

This repository and the underlying data is MIT licensed.

social-data's People

Contributors

jaredstock avatar jstock29 avatar kevinmcgee-arup avatar angelaw1233 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.