ward-wise / data-analysis Goto Github PK

Data analysis on Chicago infrastructure and infrastructure spending

License: MIT License

Python 100.00%

data-analysis's Issues

Tool for businesses to track customers' mode of transport

Businesses don't know what modes of transport their customers are using and there's no good tools to track this. This data is important, because it drives decisions around how street space is allocated for pedestrians, cyclists, and parked vehicles.
This should be started as a new repo.

Implement usaddress package

https://usaddress.readthedocs.io/en/latest/

This package might do a better job pattern matching than our current setup.

Convert geocoded CSV data into geoJSON

John Ruf geocoded data from 2022 back to 2005. We'd like to get this into a geoJSON format.
https://github.com/JohnCRuf/alderman_machine/tree/master/tasks/data_geocode_menu/output

He's got this ID column on each row. He's generating those IDs as part of his scraping, so each project will have a unique idea. I believe the geocoded data can have multiple rows with the same ID because he split text with multiple locations into multiple rows

Write scripts that turn the csv in John's data_geocode_menu/output folder (link above) into geoJSON
Get his repo running and re-scrape the data. He has a cleaning script (the older data is less consistent with location text formats). We need run that too, and I think there's some minor changes we need to make so that we don't drop columns.
Join geoJSON and re-scraped data

Create list of data sources

In Google Doc? There's the data sets on the Chicago Open Data portal to sort through. Some agencies have data on their websites that is not in the portal.

Add census geocoding API to our geocoder setup

https://geocoding.geo.census.gov/geocoder/

Should experiment first to see how well it handles Chicago street data.

Map Chicago's bike network

Use current bike map, upcoming bike lanes, and roads <10 mph to show a map of current bike network.

Geocoder - project address points onto street

Geocoding for street address ("1234 N Name Ave") uses the location of the property at that address. These should project to nearest part of the street centerline.

Set up the repo environment

Set up git branch structure
Implement a Python virtual environment
Write docs for installing virtual environment modules
Implement a secrets file
Put folder structure in place for modules

Explore EPA smart location data

https://www.epa.gov/smartgrowth/smart-location-mapping
https://enviroatlas.epa.gov/enviroatlas/DataFactSheets/pdf/Supplemental/Numberofhouseholdswithzerovehicles.pdf

Data source for measuring "location efficiency" (housing density, land use, etc.). Might be usable for something.

Extract data and perform analysis on CIP

The 5-year CIP books contain how the city spends its entire budget. The neighborhood and streetscape sections are relevant to infrastructure spending.

https://www.chicago.gov/city/en/depts/obm/provdrs/cap_improve/svcs/cip-archive.html

Geocoder - improve street intersection processing

Most of the ward spending address data is in the format "N Streetname Ave & W Otherstreetname Ave & N Western Ave & W Belmont Ave". We represent these as a polygon by finding the intersection of the street centerlines, but sometimes two north/south streets intersect way off in the distance and it produces strange results.

Get relative pathing working

Scripts and tests are referencing the package, which requires us to reinstall the package any time we make a change. Python has a way to use periods to specify a relative path so we use the file directly.

Visualize vizient vulnerability index

https://www.vizientinc.com/what-we-do/health-equity/vizient-vulnerability-index-public-access

Scrape data from Chicago Community Hardship Index

https://storymaps.arcgis.com/stories/da5601c3e0924e5ab3ee07ade9954f7a

The geographic data is useful. I would find out where it's coming from, then download by hand and/or record the location so we can make fetch requests in the future.

Create heatmap of ward spending

Take project spending with multiple locations (multiple points, street segment, or polygon) and turn into points with evenly distributed project costs.
Lines -> line of points
Polygons -> point cloud

Take the generated point data and map as a heatmap in kepler.gl

Handle multi-word street names

There are regex statements set up to handle single word streets (S BLACKSTONE AVE), but they currently don't detect multi-word street names (S DR MARTIN LUTHER KING JR DR). We need to modify the regex to detect these cases. Other functions inside the address_processing module may need to be tweaked to accept the changes.

Create a geo-coding pipeline

Write a Python function that takes an address and returns GPS coordinates.

Use geopy and the Nominatim API?

Create website to display aggregated data and analysis

Preferably something free. Use GitHub sites? Leaflet for maps?

Geocoder - improve unique street name handling

We have a custom geocoding setup that falters with some of Chicago's unique street names (e.g., "North Ave", "North Broadway", "Fulton Market", etc.). We either need to add code for these edge cases or switch to a cheap and effective geocoding API.

Set up a Pelias Docker image

https://pelias.io/

This can be used to setup a local geocoding API.

Geocoder - improve handling of alley location format

Alley location text tends to produce geometry that stretches across half the city. We need to fix the initial geocoding and implement a script to find poorly geocoded alleys and redo them.

The main issue is we're looking for all intersections between streets that bound the alley, and most parallel streets actually cross at some point in the distance. We need to interpret which streets are North-South and East-West, and then only find the intersections between N-S and E-W streets.

Scrape data from Chicago Aldermanic menu program PDFs

CIP Archive - Previous Aldermanic Menu Program Books by Year section

Yes, someone has already asked the city if this data is available in a CSV file. It is not.

Write Python functions to convert the text in the PDFs to a CSV format. You can do OCR with pytesseract, but the you might be able to get the text directly out of the PDF file using PyMuPDF or something other library. The PDFs for different years have different formats.

Double check that the raw data is not available on the Open Data Portal or anywhere else
Create functions to get raw text from PDFs
Create functions to convert the raw text into structured data for the different PDF formats
- 2019+
- 2017-2018
- 2012-2016
#1
Convert project location descriptions into GeoJSON objects (Ex: "ON N MAPLEWOOD AVE FROM W BELDEN AV (2300 N) TO W MEDILL AV (2334 N)" should be a line between those two spots)
Process data into CSVs

Create a Chicago Open Data Portal API wrapper

Write a Python module that servers as a wrapper for the Chicago Open Data Portal API. This might need to be dataset by dataset.

The API docs have example code: https://dev.socrata.com/foundry/data.cityofchicago.org/dpkg-upyz

Improving street resurfacing data calculation

The current street resurfacing scripts find the last time a street segment (section of street between intersections) was resurfaced. However, sometimes only a portion of a street segment is resurfaced at a time, which means we might be missing old sections of street because they're part of the same segment as a recently resurfaced section of street.

Changes to make:

Write a function that takes a street address, interpolates the distance along a street segment, and splits that street segment into two street segments.
Update street resurfacing script so, for a given resurfacing row, the street segments that the resurfacing starts or ends in are split using the function from step 1 (segments contained in the middle of the resurfacing can be kept whole). We should add a buffer so if resurfacing starts/ends within a few addresses of the start/end of a street segment, we just use the whole street segment (otherwise we'll end up with a bunch of tiny geometry data).

ward-wise / data-analysis Goto Github PK

data-analysis's People

Contributors

Stargazers

Watchers

Forkers

data-analysis's Issues

Recommend Projects

Recommend Topics

Recommend Org