dssg / census-communities-usa Goto Github PK
View Code? Open in Web Editor NEWMapping and analyzing local business data from the Census Bureau.
Mapping and analyzing local business data from the Census Bureau.
@JoeGermuska pointed me to this presentation:
Most local business reporting on employment focuses on net numbers because that’s all that has been available until recently. Now reporters can analyze the tremendous churn of workers in and out of new and old firms in various sectors of their local economies using LED data. Learn how to use browser-based tools to analyze and visualize this data for a state or metro audience and how to extract a local slice for more analysis.
using MoSQL
census-communities-usa doesn't make a lot of sense to me considering we are serving up origin-destination data for workers across the country.
Suggestions
As we discovered last night, it would seem that when we attempt to push the data from the CSV into MongoDB, it needs to be Unicode (or at least something that is trivial for pymongo to make into Unicode). The python-unicodecsv module got us halfway there but in order to take it the rest of the way, I needed to actually declare the incoming encoding as well. So, after a bit of trial and error, it seems that the proper encoding, at least for the Arizona geographic crosswalk table, is latin-1.
That means that in a few hours we'll have Alabama, Arkansas, Alaska and Arizona loaded.
When a user clicks on a tract, I'd like to show the inflows and outflows, similar to this: http://www.forbes.com/special-report/2011/migration.html
Example call/response:
http://ec2-54-212-141-93.us-west-2.compute.amazonaws.com/tract-origin-destination/17031010600
[{
'17031010600': {
'traveling-from': {
'17031010601': 5,
'17031010602': 3,
'17031010603': 17,
'17031010604': 43,
},
'traveling-to': {
'17031010608': 1,
'17031010609': 90,
'17031010610': 4,
'17031010611': 12,
}
}
}
Relates to #17
@evz doable?
For the homepage of Chicago Breadwinners, I'd like to show a choropleth of jobs per tract in the Chicago CBSA.
This might be a heavy call, so we could limit it to just the current year and just the S000 value. Oh ... and cache it too!
Based on at @evz's testing, postgres seems to be a much better option than mongo. Subsequently, we should convert parts of web/app.py
to run on postgres and update the documentation. [Parts of the API already use psycopg2/postgres]
Also, should we use an ORM (ie, SQLAlchemy)?
Thoughts/Feel free to debate this contention-
Based on the conversations and use cases so far, this endpoint described by @evz will be useful:
http://ec2-54-212-141-93.us-west-2.compute.amazonaws.com/tract-average/
[{
'2010': {
'17031010600': {
'S000': 10809,
'SE01': 2037,
'SE02': 4472,
'SE03': 4300
}
}
},
{
'2011': {
'17031010600': {
'S000': 12191,
'SE01': 2681,
'SE02': 4561,
'SE03': 4949
}
}
}]
Eventually, it would be nice to optionally include the tract boundary GeoJSON, perhaps with a flag like boundary=1
in the query string. @evz, how hard would it be to add that?
Relates to #17
OK, I swear I started this last night...
Anyways, Derek and I were talking about questions we might have about the data and thought it might be smart to start compiling a list so that the next time we have someone from the Census Bureau on the phone we can see about getting some answers. I'll start:
• Is there any way of accounting for telecommuting in the origin destination data?
• What do the job types (primary, private, private primary, federal, and federal primary) actually refer to?
• How reliable is the geocoding? If a worker works for a McDonalds, does their job exist at the location of the McDonalds or at the corporate headquarters?
• If the company that a worker works for is in, for example, the retail industry but their job could be described as a job in technology, what gets reported in these data?
Response time is generally pretty bogus. Once the process of fixing the length of the tract codes is finished, we should probably go ahead and setup Varnish or something similar to start caching the API responses since they really won't ever change.
It would seem to me that, since the current version of Mongo supports storing and indexing polygon data, it would be feasible to download, parse, and load all that data along with all the geographic crosswalk data. You can get shapefiles grouped by state here and there are a few python libraries that will translate those into GeoJSON that we can put into MongoDB.
According to the docs that you link to in the readme, it would seem to me that you could build a script that went and fetched the data one file at a time and shoved it into Mongo instead of attempting to lead it all in one go. That would make this project way easier for others to prop up on their own instead of making it rely on a 53GB file and a particular Mongo endpoint in order for it to work.
Started a simple project to map distribution of income by home location using SE01, SEO2 and SEO3 data fields
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.