dssg / census-communities-usa Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 15.0 3.67 MB

Mapping and analyzing local business data from the Census Bureau.

Python 97.20% Shell 2.80%

census-communities-usa's People

Contributors

Stargazers

Watchers

Forkers

pscollins evz joegermuska fmanin vanessako campustimes math4youbyusgroupillinois najah-lshanableh brendanbabb codeforanchorage fagan2888

census-communities-usa's Issues

Concatenate csvs

Setup Varnish (or some kind of caching engine)

Response time is generally pretty bogus. Once the process of fixing the length of the tract codes is finished, we should probably go ahead and setup Varnish or something similar to start caching the API responses since they really won't ever change.

endpoint for totals per census tract

Based on the conversations and use cases so far, this endpoint described by @evz will be useful:

http://ec2-54-212-141-93.us-west-2.compute.amazonaws.com/tract-average/

[{
    '2010': {
        '17031010600': {
            'S000': 10809,
            'SE01': 2037,
            'SE02': 4472,
            'SE03': 4300
        }
    }
},
{
    '2011': {
        '17031010600': {
            'S000': 12191,
            'SE01': 2681,
            'SE02': 4561,
            'SE03': 4949
        }
    }
}]

Eventually, it would be nice to optionally include the tract boundary GeoJSON, perhaps with a flag like boundary=1 in the query string. @evz, how hard would it be to add that?

Relates to #17

Travis.CI

Simple demo app

Started a simple project to map distribution of income by home location using SE01, SEO2 and SEO3 data fields

https://github.com/derekeder/chicago-breadwinners

endpoint for current job totals for a given CBSA

For the homepage of Chicago Breadwinners, I'd like to show a choropleth of jobs per tract in the Chicago CBSA.

This might be a heavy call, so we could limit it to just the current year and just the S000 value. Oh ... and cache it too!

Get SQLAlchemy going in Flask app

There's this and then there's just plain old this. Either way, it would be nice to get an ORM going.

Scraper can be actual scraper instead of loader

According to the docs that you link to in the readme, it would seem to me that you could build a script that went and fetched the data one file at a time and shoved it into Mongo instead of attempting to lead it all in one go. That would make this project way easier for others to prop up on their own instead of making it rely on a 53GB file and a particular Mongo endpoint in order for it to work.

Data Questions

OK, I swear I started this last night...

Anyways, Derek and I were talking about questions we might have about the data and thought it might be smart to start compiling a list so that the next time we have someone from the Census Bureau on the phone we can see about getting some answers. I'll start:

• Is there any way of accounting for telecommuting in the origin destination data?
• What do the job types (primary, private, private primary, federal, and federal primary) actually refer to?
• How reliable is the geocoding? If a worker works for a McDonalds, does their job exist at the location of the McDonalds or at the corporate headquarters?
• If the company that a worker works for is in, for example, the retail industry but their job could be described as a job in technology, what gets reported in these data?

limiting results again?

Good news: the site seems to be responsive again!

Bad news: looks like we're limiting results to the top 100 connected tracts again.

Migrate MongoDB -> PostgresQL

using MoSQL

Load polygons of census blocks into geo_xwalk table

It would seem to me that, since the current version of Mongo supports storing and indexing polygon data, it would be feasible to download, parse, and load all that data along with all the geographic crosswalk data. You can get shapefiles grouped by state here and there are a few python libraries that will translate those into GeoJSON that we can put into MongoDB.

rename repo to something more descriptive

census-communities-usa doesn't make a lot of sense to me considering we are serving up origin-destination data for workers across the country.

Suggestions

census-lodes-api
census-home-work-commute-api
where-we-work

Unicode errors when loading CSV

As we discovered last night, it would seem that when we attempt to push the data from the CSV into MongoDB, it needs to be Unicode (or at least something that is trivial for pymongo to make into Unicode). The python-unicodecsv module got us halfway there but in order to take it the rest of the way, I needed to actually declare the incoming encoding as well. So, after a bit of trial and error, it seems that the proper encoding, at least for the Arizona geographic crosswalk table, is latin-1.

That means that in a few hours we'll have Alabama, Arkansas, Alaska and Arizona loaded.

research existing work with this data

@JoeGermuska pointed me to this presentation:

Digging Deeper on Employment: Local Employment Dynamics Data

Paul Overberg, database editor, USA TODAY

Most local business reporting on employment focuses on net numbers because that’s all that has been available until recently. Now reporters can analyze the tremendous churn of workers in and out of new and old firms in various sectors of their local economies using LED data. Learn how to use browser-based tools to analyze and visualize this data for a state or metro audience and how to extract a local slice for more analysis.

Download PowerPoint
Watch video of session

endpoint for traveling to/from of a given tract

When a user clicks on a tract, I'd like to show the inflows and outflows, similar to this: http://www.forbes.com/special-report/2011/migration.html

Example call/response:

http://ec2-54-212-141-93.us-west-2.compute.amazonaws.com/tract-origin-destination/17031010600

[{
    '17031010600': {
        'traveling-from': {
            '17031010601': 5,
            '17031010602': 3,
            '17031010603': 17,
            '17031010604': 43,
        },
        'traveling-to': {
            '17031010608': 1,
            '17031010609': 90,
            '17031010610': 4,
            '17031010611': 12,
        }
    }
}

Relates to #17

@evz doable?

Convert API from Mongo to Postgres

Based on at @evz's testing, postgres seems to be a much better option than mongo. Subsequently, we should convert parts of web/app.py to run on postgres and update the documentation. [Parts of the API already use psycopg2/postgres]

Also, should we use an ORM (ie, SQLAlchemy)?

Thoughts/Feel free to debate this contention-