Giter Site home page Giter Site logo

scrapdviz's Introduction

ScrAPD

https://circleci.com/gh/scrapd/scrapd.svg?style=svg https://coveralls.io/repos/github/scrapd/scrapd/badge.svg?branch=master

Extract data from APD news site.

ScrAPD is a small utility designed to help organizations retrieving traffic fatality data in a friendly manner.

Installation

ScrAPD requires Python 3.7+ to work.

pip install scrapd

Quickstart

Collect all the data as CSV:

scrapd --format csv

By default, scrapd does not display anything until it is done collecting the data. If you want to get some feedback about the process, you can enable logging, by adding the -v to the command you want to use. Multiple -v options increase the verbosity. The maximum is 3 (-vvv):

scrapd -v --format csv

To save the results to a file, use the shell redirection:

scrapd -v --format csv > results.csv

Note

The logs are displayed to stderr and will not appear in the result file generated by the redirection. If you want to include this information add &2>1.

Examples

Retrieve the traffic fatalities that happened between January 15th 2019 and January 18th 2019, and output the results in json:

scrapd --from "Jan 15 2019" --to "Jan 18 2019" --format json

[
  {
    "case": "19-0161105",
    "crash": 2,
    "date": "2019-01-16",
    "fatalities": [
      {
        "age": 58,
        "dob": "1960-02-15",
        "ethnicity": "White",
        "first": "Ann",
        "gender": "Female",
        "generation": "",
        "last": "Bottenfield-Seago",
        "middle": ""
      }
    ],
    "latitude": 0.0,
    "link": "http://austintexas.gov/news/traffic-fatality-2-3",
    "location": "West William Cannon Drive and Ridge Oak Road",
    "longitude": 0.0,
    "notes": "The preliminary investigation shows that the grey, 2003 Volkwagen Jetta being driven by Ann Bottenfield-Seago failed to yield at a stop sign while attempting to turn westbound on to West William Cannon Drive from Ridge Oak Road. The Jetta collided with a black, 2017 Chevrolet truck that was eastbound in the inside lane of West William Cannon Drive. Bottenfield-Seago was pronounced deceased at the scene. The passenger in the Jetta and the driver of the truck were both transported to a local hospital with non-life threatening injuries. No charges are expected to be filed.",
    "time": "15:42:00"
  },
  {
    "case": "19-0150158",
    "crash": 1,
    "date": "2019-01-15",
    "fatalities": [
      {
        "age": 31,
        "dob": "1987-07-09",
        "ethnicity": "White",
        "first": "David",
        "gender": "Male",
        "generation": "",
        "last": "Sell",
        "middle": "Hilburn"
      }
    ],
    "latitude": 0.0,
    "link": "http://austintexas.gov/news/traffic-fatality-1-4",
    "location": "10500 block of N IH 35 SB",
    "longitude": 0.0,
    "notes": "The preliminary investigation shows that a 2000 Peterbilt semi truck was travelling southbound in the center lane on IH 35 when it struck pedestrian David Sell. The driver stopped as soon as it was possible to do so and remained on scene. He reported not seeing the pedestrian prior to impact given that it was still dark at the time of the crash. Sell was pronounced deceased at the scene at 6:24 a.m. No charges are expected to be filed.",
    "time": "06:20:00"
  }
]

Do the same research but output as CSV:

scrapd --from "Jan 15 2019" --to "Jan 18 2019" --format csv


crash,case,date,time,location,first,middle,last,generation,ethnicity,gender,dob,age,link,notes
2,19-0161105,2019-01-16,15:42:00,West William Cannon Drive and Ridge Oak Road,Ann,,Bottenfield-Seago,,White,Female,1960-02-15,58,http://austintexas.gov/news/traffic-fatality-2-3,"The preliminary investigation shows that the grey, 2003 Volkwagen Jetta being driven by Ann Bottenfield-Seago failed to yield at a stop sign while attempting to turn westbound on to West William Cannon Drive from Ridge Oak Road. The Jetta collided with a black, 2017 Chevrolet truck that was eastbound in the inside lane of West William Cannon Drive. Bottenfield-Seago was pronounced deceased at the scene. The passenger in the Jetta and the driver of the truck were both transported to a local hospital with non-life threatening injuries. No charges are expected to be filed."
1,19-0150158,2019-01-15,06:20:00,10500 block of N IH 35 SB,David,Hilburn,Sell,,White,Male,1987-07-09,31,http://austintexas.gov/news/traffic-fatality-1-4,The preliminary investigation shows that a 2000 Peterbilt semi truck was travelling southbound in the center lane on IH 35 when it struck pedestrian David Sell. The driver stopped as soon as it was possible to do so and remained on scene. He reported not seeing the pedestrian prior to impact given that it was still dark at the time of the crash. Sell was pronounced deceased at the scene at 6:24 a.m. No charges are expected to be filed.

Retrieve all the traffic fatalities from 2019 (as of Jan 20th 2019) in json, and enabling the logging to follow the progress of the process:

scrapd -v --from "1 1 2019" --to "Jan 20 2019" --format json

Fetching page 1...
Fetching page 2...
Fetching page 3...
Fetching page 4...
Fetching page 5...
Fetching page 6...
Fetching page 7...
Fetching page 8...
Fetching page 9...
Fetching page 10...
Fetching page 11...
Total: 3
[
  {
    "case": "19-0081623",
    "crash": 3,
    "date": "2019-01-08",
    "fatalities": [
      {
        "age": 15,
        "dob": "2003-02-18",
        "ethnicity": "Hispanic",
        "first": "Jesus",
        "gender": "Male",
        "generation": "",
        "last": "Servantez",
        "middle": ""
      }
    ],
    "latitude": 0.0,
    "link": "http://austintexas.gov/news/traffic-fatality-3-4",
    "location": "3600 block of South Capital of Texas Highway SB",
    "longitude": 0.0,
    "notes": "The preliminary investigation shows that the driver of a silver, 2018 KIA was traveling in the center lane of the 3600 block of South Capital of Texas Highway SB when the car collided with Jesus Servantez, a pedestrian in the roadway. The driver remained at the scene and told investigators that he did not see Servantez prior to the crash.\n\n\tJesus Servantez was transported to Saint David\u2019s South Austin Medical Center where he succumbed to his injuries on January 21, 2019. No charges are expected to be filed.",
    "time": "21:37:00"
  },
  {
    "case": "19-0161105",
    "crash": 2,
    "date": "2019-01-16",
    "fatalities": [
      {
        "age": 58,
        "dob": "1960-02-15",
        "ethnicity": "White",
        "first": "Ann",
        "gender": "Female",
        "generation": "",
        "last": "Bottenfield-Seago",
        "middle": ""
      }
    ],
    "latitude": 0.0,
    "link": "http://austintexas.gov/news/traffic-fatality-2-3",
    "location": "West William Cannon Drive and Ridge Oak Road",
    "longitude": 0.0,
    "notes": "The preliminary investigation shows that the grey, 2003 Volkwagen Jetta being driven by Ann Bottenfield-Seago failed to yield at a stop sign while attempting to turn westbound on to West William Cannon Drive from Ridge Oak Road. The Jetta collided with a black, 2017 Chevrolet truck that was eastbound in the inside lane of West William Cannon Drive. Bottenfield-Seago was pronounced deceased at the scene. The passenger in the Jetta and the driver of the truck were both transported to a local hospital with non-life threatening injuries. No charges are expected to be filed.",
    "time": "15:42:00"
  },
  {
    "case": "19-0150158",
    "crash": 1,
    "date": "2019-01-15",
    "fatalities": [
      {
        "age": 31,
        "dob": "1987-07-09",
        "ethnicity": "White",
        "first": "David",
        "gender": "Male",
        "generation": "",
        "last": "Sell",
        "middle": "Hilburn"
      }
    ],
    "latitude": 0.0,
    "link": "http://austintexas.gov/news/traffic-fatality-1-4",
    "location": "10500 block of N IH 35 SB",
    "longitude": 0.0,
    "notes": "The preliminary investigation shows that a 2000 Peterbilt semi truck was travelling southbound in the center lane on IH 35 when it struck pedestrian David Sell. The driver stopped as soon as it was possible to do so and remained on scene. He reported not seeing the pedestrian prior to impact given that it was still dark at the time of the crash. Sell was pronounced deceased at the scene at 6:24 a.m. No charges are expected to be filed.",
    "time": "06:20:00"
  }
]

The "Case" field is assigned by the Austin Police Department, while the "ID" field is a key assigned by ScrAPD to label a separate record for each deceased person.

Speed and accuracy

ScrAPD executes all the requests in an asynchronous manner. As a result it goes very fast.

It parses the information using both the text of the report itself and the Twitter tweet stored in the page metadata. Combining these two methods provides a high degree of confidence in the parsing and allows us to reach more than 95% of success rate.

Some statistics:

  • 99% of entries detected
  • 95% of entries correctly parsed
    • 5% of entries failed the parsing only partially
    • 0 complete failure
  • processing time: ~2m00s for ~140 entries

Who uses ScrAPD?

The Austin Pedestrian Advisory Council used ScrAPD to compile a detailed presentation of the status of the traffic deaths in Austin, TX:

The following organizations also use ScrAPD:

See more at http://docs.scrapd.org/who-is-using-us/

Related resources

scrapdviz's People

Contributors

andrewgibson27 avatar bahtou avatar dependabot[bot] avatar dontehightower avatar jarrmill avatar rgreinho avatar snyk-bot avatar tjdev7 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

scrapdviz's Issues

Update age distribution groups

Issue Type

  • Feature request

Current Behavior

In the current behavior, there are up to 8 different age groups, which is too much.

Expected Behavior

A pie chart should not have more than 6 groups.

Possible Solution

Better age grouping:

  • 00-17 (kids)
  • 18-24 (students + young professionals)
  • 25-49 (families with kids)
  • 50-64 (families without kids)
  • 65+ (retirees)

Generate report

As a user I must be able to generate reports for a given period of time, with or without graphs.

The 2 first use cases are:

  1. Monthly reports
  2. Yearly retrospective

The reports are similar, but the user must be able to:

  • Select a certain period of time
  • Select which graphs to export (or None)
  • Generate a PDF

Add date filetring on the the client side

Issue Type

  • Feature request

Current Behavior

The filtering is not yet implemented.

Expected Behavior

Connect the date picker the the function retrieving the data in order to limit the data between 2 dates.

Display the data in a grid

The data must be retrieved and display in a tabular manner.

This must simulate what the users do when they parse the data manually and populate their spreadsheet.

A success criteria is to completely remove the need of creating a spreadsheet.

Does not show current month by default

Issue Type

  • Bug report

Current Behavior

When browsing to https://viz.scrapd.org, it shows the data from April 2019, even though we are in May (at the date this issue was created).

Expected Behavior

The dashboard should always display the data of the current month by default.

Possible Solution

Ideas:

  • Check that momentjs initializes the date picker correctly.

Table data should be sorted

Issue Type

  • Feature request

Current Behavior

The data in the table is publishes in the order it is inserted in the data set.

Expected Behavior

The table should at least be sorted by date.

Potentially, we could add the option to sort by some other relevant columns, like shown in the example.

Add a "time of the day" distribution graph

Issue Type

  • Feature request

Current Behavior

N/A

Expected Behavior

As a user I want to be able to determine at what time during the day the accidents are more likely to happen. For instance, is it during morning rush hour or evening rush hour? After work, or at night?

Possible Solution

  • Add a pie chart
  • Find a way to create meaningful buckets
    • 630a-930a: morning rush hour
    • 400p-700p: evening rush hour
    • Find something similar for the rest of the day

Deployment fails when there is nothing to deploy

Issue Type

  • Bug report

Current Behavior

Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

Expected Behavior

If there is nothing to do, the pipeline should simply stop, not fail.

Fix deployment

Issue Type

  • Bug report

Current Behavior

The project does not automatically deploy.

Expected Behavior

When merging to master the project should automatically deploy. Once we are confident with the deployment pipeline, it will be restricted to tags only.

Possible Solution

Use the CIRCLECI environment variable to define the remote. If it is defined, the remote will be origin, otherwise it should be upstream.

Add date picker shortcuts

Issue Type

  • Feature request

Current Behavior

There is currently only shortcuts for the months.

Expected Behavior

We need to have shortcuts for the current and the past year as they are part of the most common workflows.

Map clusters don't render when using all the data.

Issue Type

  • Bug report

Current Behavior

If I choose to access all the data in the data picker, the map does not show any cluster. However it works fine if I select the data year by year.

Expected Behavior

We should be able to visualize all the markers on the map.

Create a generic pie chart component

Issue Type

  • Feature request

Current Behavior

Each pie chart is a copy/paste of another, except for the grouping parameter (gender, ethnicity)

Expected Behavior

Make a ScrAPD generic pie chart (as only the data and title would change, could pass 'property' (gender, ethnicity, etc.)).

Name it ScrapdDistributionGraph.

unable to `npm run build` successfully: missing icon file

Issue Type

  • Bug report

Current Behavior

npm run build launches the build process but fails

Expected Behavior

npm run build to finish successfully

Possible Solution

Add missing icon?

Steps to Reproduce

  1. run npm run build on the command line

Fatality counter counter actually counts crashes

Issue Type

  • Bug report

Current Behavior

The fatality counter actually counts the number of crashes, not the actual number of fatalities as we do not have this information in the current data set.

Expected Behavior

Replace the text to show "Crash(es)" instead of "Fatality(ies)".

Update date picker presets

Issue Type

  • Feature request

Current Behavior

Currently we have last and current year/month, as well as all.

Expected Behavior

It would be more intuitive for the year to have the 2 or 4 digit name, especially as we are working on importing data since 2013.

Possible Solution

2013 2014 2015 2016 2017 2018 2019 last_month current_month all
13 14 15 16 17 18 19 last_month current_month all
'13 '14 '15 '16 '17 '18 '19 last_month current_month all

Add date filtering

The user must be able to retrieve the data for a specific period of time. Therefore they must be able to provide a start date and an end date.

Loads the current month by default

Issue Type

  • Feature request

Current Behavior

Currently, the use needs to pick the dates to be able to visualize some data.

Expected Behavior

The page should load with the data of the current month.

Add "all" filter in date picker

Issue Type

  • Feature request

Current Behavior

There is currently no way to select all the data quickly from the date picker.

Expected Behavior

Add a all button as a quick filter in the date picker.

Download dataset

Issue Type

  • Feature request

Current Behavior

There is currently no way to download the data set from ScrAPDviz.

Expected Behavior

Provide the user that ability to download the part of the data set he/she is viewing. The available formats should be at least CSV and JSON,.

Add clipboard buttons to copy the table content in CSV

Issue Type

  • Feature request

Current Behavior

N/A

Expected Behavior

Add a button which would copy the content of the table into the clipboard. The content has to be in CSV format to be able to be inserted in a spreadsheet.

Add a no fatality for X days counter

Issue Type

  • Feature request

Current Behavior

This feature does not exist.

Expected Behavior

Add a counter showing how many days passed without a fatality occurring.

Under the counter, the shortest and longest streak within the selected period should be displayed as well.

Hide the table in mobile mode

Issue Type

  • Feature request

Current Behavior

Can the table be mobile friendly? The minimum size for the table to be displayed correctly is 1024px + 24px for the frame of the application, thus 1048px are needed. It also has a medium style which may help a little.

Expected Behavior

  • We should support devices from 320px.
  • Mobile mode: 320px -> 600px
  • Desktop: mode > 601px

Possible Solution

We could hide the table if the size of the screen is < 1048px.

Antd CSS is not exported with the build

Issue Type

  • Bug report

Current Behavior

The static build works well, but the antd CSS is included.

Expected Behavior

The static build should look exactly like the node version.

Add percentages to the Graph

Issue Type

  • Feature request

Current Behavior

The graphs only display the labels.

Expected Behavior

The percentage value should be displayed to help the user make more sense of the data he/she is looking at.

The absolute value could be shown too, if is does not cluter the graph.

Possible Solution

Probably use sublabels to display this information.

Flexbox layout

Issue Type

  • Feature request

Current Behavior

Graphs are displayed using inlind-block.

Expected Behavior

  • Use flexbox to arrange the graphs
  • The boxes should be 280px wide to match the size of the date picker
  • The should be some space in between the graphs

Fatality counter cannot display more than 2 digit numbers

Issue Type

  • Bug report

Current Behavior

If the number of fatalities goes above 99, the visual gets broken:

image

Expected Behavior

We should be able to display a 3 digit fatality count.

Possible Solution

Use emotion to adjust the font size based on the number of digits to display.

Create a logo

The project needs a logo.

The idea would be to do something similar to what was done for RYR: the logos must be similar but different. The same idea must come from the logos, but they must have their own identities.

Create graph component for "Crash causes"

Issue Type

  • Feature request

Current Behavior

There is no graph showing this information.

Expected Behavior

Following the model of the other graph component, we want a graph showing the percentage crash causes. For instance "50% ran a light/stop, 30% because of speeding, 20% because of impairment".

Graph labels are hard to read

Issue Type

  • Feature request

Current Behavior

Due to the graph colors, font, and font color, it makes it hard to read the labels.

Expected Behavior

The labels should be quickly readable.

Possible Solution

Add a background to the labels to control the contrast.

Add Contact + Social media link in the footer

Issue Type

  • Feature request

Current Behavior

There is currently no way to get this information from the dashboard.

Expected Behavior

As a user I should have a way to contact the project owner(s) at "[email protected]" and a way to find the code on GitHub.

Possible Solution

Add the appropriate icons+links in the footer.

Order graph legend alphabetically

Issue Type

  • Feature request

Current Behavior

The legend is generated on the fly based on the information found in the data set matching a specific time frame.

Expected Behavior

As a user I expect the legend to be consistent between different time frames.

Possible Solution

Sort the categories alphabetically.

Link the table data to the case webpage

Issue Type

  • Feature request

Current Behavior

There is no link between the table and the APD reports unless you dig through the raw data.

Expected Behavior

As a user I want the ability to click on a case number and be redirected to the associated APD report.

For instance, if I click on the case number 19-0961200, I want to be sent to https://austintexas.gov/news/traffic-fatality-17-4

Possible Solution

antd offers the ability to define a function to render a cell: https://ant.design/components/table/#components-table-demo-basic

Create the "Archives" view

Issue Type

  • Feature request

Current Behavior

Archives are currently not handled, but the pipeline is being built for it.

Expected Behavior

The first version should be a simple view which will set the foundation to display more information in the future.

  • Add a menu where the user can access the current data set, and the archived data set. Use the menu from Antd. NextJS also has the component.
  • Display the data picker
  • DO NOT display the table
  • Show only the following graphs
    • Year to year comparison
    • Month distribution
    • Weekday distribution
    • Time distribution
  • Add the fatality counter
  • Add the map

Do not modify the redux store or anything else for this version.

Deploy only when tagging

Issue Type

  • Feature request

Current Behavior

Every time something gets merged into master, we redeploy the site.

Expected Behavior

No that we are confident in the deployment pipeline, we should only deployed tagged version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.