Giter Site home page Giter Site logo

datausa-tracker's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

datausa-tracker's Issues

Geographies name update

Todo:

  • Rename places that are duplicate within a state. Use a script.
  • use namelsad for counties, manually edited on server
  • use namelsad for counties, change in acs-core mondrian schema_shared.py

As part of the pluck "bug" in Mondrian rest, found out that there's lots of duplicates names that weren't handled well in data and in Mondrian.

Report for the weird Mondrian Rest bug:What's happening is a combination of an implicit Mondrian cube server behavior, combined with lack of error handling in Mondrian Rest.In the counties dim, there are counties in the same state which have the same short name, e.g. "Roanoke City" and "Roanoke County" become "Roanoke". Both have different ID (geoid). However, Mondrian doesn't like duplicate names, so it decides that there's only one dimension level member with name "Roanoke", so it simply takes the last last Roanoke it sees.This translates into an error in Mondrian Rest because the ID key for the first "Roanoke" still exists, even though the associated dimension level member has been silently squashed.When formatting the default json to something else, MRest uses a field called cell_keys, which holds all the ID keys. MRest then maps each ID key to a dimension level member, but because of Mondrian's behavior, that dim level member no longer exists for the duplicate name. Ruby happily assigns a NilClass instead of the expected mapping. Then in the next step, metadata is plucked out of each dim level member, and when it comes time to do this with the NilClass element, an exception occurs.Solution:

  • step one, need to make sure there's no duplicate names in dimensions, even if ID keys are distinct. (I'll have something up tomorrow morning, it's an easy fix)
  • step two, to help recognize this in the future, I'm going to catch that exception at the pluck and give a nice error message explaining that the cause might be the above.
    cc @jspeiser @dave
    (I've done step one locally, that's how I know it's the issue. But I need to coordinate a few things to make sure it's a permanent fix tomorrow morning)

ACS

Normal tables

  • B01002
  • B01003
  • B03002
  • B05001
  • B05004
  • B05006
  • B06001
  • B08006
  • B08013
  • B08014
  • B08136
  • B08301
  • B08303
  • B16001 - needs data upload after rerun
  • B17001 - needs data upload after rerun
  • B19001
  • B19013
  • B21002
  • B24010 - name overlap mistake, needs rerun (ygso)
  • B24011 - schema plural bug
  • B24012 - schema plural bug
  • B24030 - name overlap mistake, needs rerun (ygsi)
  • B24031 - schema plural bug
  • B24032 - schema plural bug
  • B25003
  • B25075
  • B25077
  • B25102
  • B27001
  • C24010 - name overlap mistake, needs rerun (ygso)
  • C24030 - name overlap mistake, needs rerun (ygsi)

health insurance agg

  • B27002 - B27009 - csv done, needs upload and schema frag

race/ethnicity

  • B17001A - still running...
  • B19013A - still running...

Checkmark means schema and data in place; there may still be bugs though!

  • Update all to 2016 (have to add acs/ to fetch url)

  • for schema generator, add moe to measures

County Health Rankings Data

  • Ingest
  • Schema
  • Build map for years/source

(Note from @hwchen... map is built for years/source, but check back in because I don't know final format)

Tuition by CIP

We will need a separate cube for tuition by CIP

  • data
  • schema

BEA I/O Data

  • Ingest use data
  • IO Code Hierarchy
  • Create schema

health insurance ACS tables for deloitte

Process B207010 using https://www.kff.org/other/state-indicator/total-population/?currentTimeframe=0&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D to do some manual roll-up

  • B25048
  • B25052- can we do as percentages?
  • B28002- listed as a number--can we convert to percentages?
  • B25092
  • C25074
  • B27010 if we can roll it up
  • B27004- any way we can convert to percentages?
  • B27007- can we convert to percentages?
  • B27006- can we convert to percentages?

[x] all etl done, just waiting on customizing schema

Census-geo; puma name updates

Add name column to pumas geospatial table that converts NYC-Manhattan Community District 8--Upper East Side PUMA to Upper East Side PUMA

  • update postgres pumas table
  • update scripts
  • update monetdb

acs null values

in a nutshell, as of 2016 there are now a few "magic" values that equal null: -666666666, -222222222, -999999999, -555555555, -333333333 ... but make we'll need to make sure to null those values out in the dataframe so that sums dont get messed up ... you might want to use na_values when loading the dataframe in pandas or df.replace

Unfortunately, looks like I missed this one. So all datasets will have to be rerun, or manually fixed in the db.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.