Giter Site home page Giter Site logo

data-analysis's People

Contributors

harrybrisson avatar kollerbud avatar s-sajid-ali avatar smacmullan avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

data-analysis's Issues

Tool for businesses to track customers' mode of transport

Businesses don't know what modes of transport their customers are using and there's no good tools to track this. This data is important, because it drives decisions around how street space is allocated for pedestrians, cyclists, and parked vehicles.
This should be started as a new repo.

Convert geocoded CSV data into geoJSON

John Ruf geocoded data from 2022 back to 2005. We'd like to get this into a geoJSON format.
https://github.com/JohnCRuf/alderman_machine/tree/master/tasks/data_geocode_menu/output

He's got this ID column on each row. He's generating those IDs as part of his scraping, so each project will have a unique idea. I believe the geocoded data can have multiple rows with the same ID because he split text with multiple locations into multiple rows

  • Write scripts that turn the csv in John's data_geocode_menu/output folder (link above) into geoJSON
  • Get his repo running and re-scrape the data. He has a cleaning script (the older data is less consistent with location text formats). We need run that too, and I think there's some minor changes we need to make so that we don't drop columns.
  • Join geoJSON and re-scraped data

Create list of data sources

In Google Doc? There's the data sets on the Chicago Open Data portal to sort through. Some agencies have data on their websites that is not in the portal.

Set up the repo environment

  • Set up git branch structure
  • Implement a Python virtual environment
  • Write docs for installing virtual environment modules
  • Implement a secrets file
  • Put folder structure in place for modules

Geocoder - improve street intersection processing

Most of the ward spending address data is in the format "N Streetname Ave & W Otherstreetname Ave & N Western Ave & W Belmont Ave". We represent these as a polygon by finding the intersection of the street centerlines, but sometimes two north/south streets intersect way off in the distance and it produces strange results.

Get relative pathing working

Scripts and tests are referencing the package, which requires us to reinstall the package any time we make a change. Python has a way to use periods to specify a relative path so we use the file directly.

Create heatmap of ward spending

Take project spending with multiple locations (multiple points, street segment, or polygon) and turn into points with evenly distributed project costs.
Lines -> line of points
Polygons -> point cloud

Take the generated point data and map as a heatmap in kepler.gl

Handle multi-word street names

There are regex statements set up to handle single word streets (S BLACKSTONE AVE), but they currently don't detect multi-word street names (S DR MARTIN LUTHER KING JR DR). We need to modify the regex to detect these cases. Other functions inside the address_processing module may need to be tweaked to accept the changes.

Geocoder - improve unique street name handling

We have a custom geocoding setup that falters with some of Chicago's unique street names (e.g., "North Ave", "North Broadway", "Fulton Market", etc.). We either need to add code for these edge cases or switch to a cheap and effective geocoding API.

Geocoder - improve handling of alley location format

Alley location text tends to produce geometry that stretches across half the city. We need to fix the initial geocoding and implement a script to find poorly geocoded alleys and redo them.

The main issue is we're looking for all intersections between streets that bound the alley, and most parallel streets actually cross at some point in the distance. We need to interpret which streets are North-South and East-West, and then only find the intersections between N-S and E-W streets.

Scrape data from Chicago Aldermanic menu program PDFs

CIP Archive - Previous Aldermanic Menu Program Books by Year section

Yes, someone has already asked the city if this data is available in a CSV file. It is not.

Write Python functions to convert the text in the PDFs to a CSV format. You can do OCR with pytesseract, but the you might be able to get the text directly out of the PDF file using PyMuPDF or something other library. The PDFs for different years have different formats.

  • Double check that the raw data is not available on the Open Data Portal or anywhere else
  • Create functions to get raw text from PDFs
  • Create functions to convert the raw text into structured data for the different PDF formats
    • 2019+
    • 2017-2018
    • 2012-2016
  • #1
  • Convert project location descriptions into GeoJSON objects (Ex: "ON N MAPLEWOOD AVE FROM W BELDEN AV (2300 N) TO W MEDILL AV (2334 N)" should be a line between those two spots)
  • Process data into CSVs

Improving street resurfacing data calculation

The current street resurfacing scripts find the last time a street segment (section of street between intersections) was resurfaced. However, sometimes only a portion of a street segment is resurfaced at a time, which means we might be missing old sections of street because they're part of the same segment as a recently resurfaced section of street.

Changes to make:

  1. Write a function that takes a street address, interpolates the distance along a street segment, and splits that street segment into two street segments.
  2. Update street resurfacing script so, for a given resurfacing row, the street segments that the resurfacing starts or ends in are split using the function from step 1 (segments contained in the middle of the resurfacing can be kept whole). We should add a buffer so if resurfacing starts/ends within a few addresses of the start/end of a street segment, we just use the whole street segment (otherwise we'll end up with a bunch of tiny geometry data).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.