datausa-tracker's Issues
Opiods data
- ingest
- schema
- test
acs null values
in a nutshell, as of 2016 there are now a few "magic" values that equal null: -666666666, -222222222, -999999999, -555555555, -333333333
... but make we'll need to make sure to null those values out in the dataframe so that sums dont get messed up ... you might want to use na_values
when loading the dataframe in pandas or df.replace
Unfortunately, looks like I missed this one. So all datasets will have to be rerun, or manually fixed in the db.
add error_for_measure annotations to pums
IPEDS Retention Rates Cube
- data
- schema
- test
health insurance ACS tables for deloitte
Process B207010 using https://www.kff.org/other/state-indicator/total-population/?currentTimeframe=0&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D to do some manual roll-up
- B25048
- B25052- can we do as percentages?
- B28002- listed as a number--can we convert to percentages?
- B25092
- C25074
- B27010 if we can roll it up
- B27004- any way we can convert to percentages?
- B27007- can we convert to percentages?
- B27006- can we convert to percentages?
[x] all etl done, just waiting on customizing schema
Defaults Data
Dartmouth Data
- Ingest Dartmouth Atlas of Health Data
- Create Schema
Add RCA to PUMS Cubes
- Population RCAs
- Wage RCAs
IPEDS Tuition Tables
- Create Pipeline
- Ingest
- Schema
New Health Data
Freight Data
- Ingest
- Schema
IPEDS Degrees Awarded Cube
- data
- schema
- test
IPEDS Instructional Salaries
- Data
- Schema
connect university coordinates to geoservice api
Convert O*NET dataset to cube architecture
- Ingest ONET Data
- Create Schema
Setup geoservice-api
Stand-up basic geoservice-api for computing related geographies.
BEA I/O Data
- Ingest use data
- IO Code Hierarchy
- Create schema
IPEDS University Student Financial Aid
- Data
- Schema
add annotation to hide exact MOE values by default in pums cubes
the exact values for PUMS MOEs can be time consuming to compute so by default these should be hidden unless explicitly requested
Tuition by CIP
We will need a separate cube for tuition by CIP
- data
- schema
Geographies name update
Todo:
- Rename places that are duplicate within a state. Use a script.
- use namelsad for counties, manually edited on server
- use namelsad for counties, change in acs-core mondrian schema_shared.py
As part of the pluck
"bug" in Mondrian rest, found out that there's lots of duplicates names that weren't handled well in data and in Mondrian.
Report for the weird Mondrian Rest bug:What's happening is a combination of an implicit Mondrian cube server behavior, combined with lack of error handling in Mondrian Rest.In the counties dim, there are counties in the same state which have the same short name, e.g. "Roanoke City" and "Roanoke County" become "Roanoke". Both have different ID (geoid). However, Mondrian doesn't like duplicate names, so it decides that there's only one dimension level member with name "Roanoke", so it simply takes the last last Roanoke it sees.This translates into an error in Mondrian Rest because the ID key for the first "Roanoke" still exists, even though the associated dimension level member has been silently squashed.When formatting the default json to something else, MRest uses a field called cell_keys
, which holds all the ID keys. MRest then maps each ID key to a dimension level member, but because of Mondrian's behavior, that dim level member no longer exists for the duplicate name. Ruby happily assigns a NilClass
instead of the expected mapping. Then in the next step, metadata is plucked out of each dim level member, and when it comes time to do this with the NilClass
element, an exception occurs.Solution:
- step one, need to make sure there's no duplicate names in dimensions, even if ID keys are distinct. (I'll have something up tomorrow morning, it's an easy fix)
- step two, to help recognize this in the future, I'm going to catch that exception at the
pluck
and give a nice error message explaining that the cause might be the above.
cc @jspeiser @dave
(I've done step one locally, that's how I know it's the issue. But I need to coordinate a few things to make sure it's a permanent fix tomorrow morning)
BLS Growth Data
- BLS Industry Growth
- BLS Occupation Growth
Census-geo: incorporate county & place renames into app
for now we've included ad-hoc renames of namelsad
to satisfy Mondrian's name uniqueness requirements for places but moving forward we'll likely want to do this at the shapefile ingest phase
IPEDS Graduation Rates Cube
Economic Census Data
IPEDS Admissions Data
- Ingest
- Schema
ingest university coordinates to geospatial database
Census-geo; puma name updates
Add name
column to pumas geospatial table that converts NYC-Manhattan Community District 8--Upper East Side PUMA
to Upper East Side PUMA
- update postgres pumas table
- update scripts
- update monetdb
IPEDS IC Living Expenses
SFA Aid by Income Level
- Data
- Schema
BLS CES Data
- Ingest
- Schema
generate university similarity data
disable hideMemberIf parentName for industries/occupations
from the deeper levels disable hideMemberIf="IfParentsName"
County Health Rankings Data
- Ingest
- Schema
- Build map for years/source
(Note from @hwchen... map is built for years/source, but check back in because I don't know final format)
IPEDS Enrollment Data Cube
IPEDS Enrollment Data Cube
- ingest
- schema
ACS language 2013-2014
etl to csv done, just need to upload and update table.
Create University dimension table
IPEDS Non Instructional Salaries
- Create Pipeline
- Ingest
- Schema
PUMS SOC to BLS SOC crosswalk update
Several BLS SOC codes in the new ONET data are missing mappings to similar PUMS codes. Need to update the crosswalk.
Federal Spending Data
- Ingest
- Schema
- Update with Financial Assistance data
IPEDS Graduation Rates by Timeframe
IPEDS Living SFA YUA
- Ingest
- Cube Schema
IPEDS Endowment Quintiles Cube
ACS
Normal tables
- B01002
- B01003
- B03002
- B05001
- B05004
- B05006
- B06001
- B08006
- B08013
- B08014
- B08136
- B08301
- B08303
- B16001 - needs data upload after rerun
- B17001 - needs data upload after rerun
- B19001
- B19013
- B21002
- B24010 - name overlap mistake, needs rerun (ygso)
- B24011 - schema plural bug
- B24012 - schema plural bug
- B24030 - name overlap mistake, needs rerun (ygsi)
- B24031 - schema plural bug
- B24032 - schema plural bug
- B25003
- B25075
- B25077
- B25102
- B27001
- C24010 - name overlap mistake, needs rerun (ygso)
- C24030 - name overlap mistake, needs rerun (ygsi)
health insurance agg
- B27002 - B27009 - csv done, needs upload and schema frag
race/ethnicity
- B17001A - still running...
- B19013A - still running...
Checkmark means schema and data in place; there may still be bugs though!
-
Update all to 2016 (have to add
acs/
to fetch url) -
for schema generator, add moe to measures
IPEDS Expenses Cube
- data
- schema
county, place and puma names
IPEDS Financials Cube
- data
- schema
- test
PUMS NAICS to BEA I/O Code Mapping
Need to provide a mapping which given a PUMS NAICS code will provide the closest matching IO Code
create git repo for datausa-mondrian and update monetdb jdbc driver
Create CIP dimension table
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.