Giter Site home page Giter Site logo

censusapi's Introduction

Hello — I’m a data reporter who investigates how the decisions made by people in power affect our lives using statistical analysis, graphics, and traditional reporting techniques.

I built and maintain censusapi, an R package that makes it easy to grab U.S. Census Bureau data programmatically.

I’m available for full-time or contract data journalism, programming, and analysis jobs. See more and reach out on my website.

censusapi's People

Contributors

hrecht avatar mtreg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

censusapi's Issues

Vignette code chunks not evaluating in pkgdown build

Examples in the Getting Started vignette that require an API key are not run on CRAN, using the NOT_CRAN convention established in other documentation.


{r, echo = FALSE}
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(purl = NOT_CRAN)

For example, this code chunk does not evaluate in the pkgdown documentation build:


{r, purl = NOT_CRAN, eval = NOT_CRAN}
getCensus(name = "timeseries/healthins/sahie",
	vars = c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
	region = "us:*", time = 2015)

This is currently leading to the pkdown build of the vignette not showing results. When compiled locally with knitr, these code chunks do evaluate as intended.

error retrieving 2016 ACS

I obtain an error when retrieving 2016 5 year ACS estimates (code and error are below).

I believe the issue is because the api url for the 2016 data is slightly different that prior years. For example, urls for one variable from one state in 2015 and 2016 are, respectively: https://api.census.gov/data/2015/acs5?get=NAME,B02001_001E&for=tract:*&in=state:04
https://api.census.gov/data/2016/acs/acs5?get=NAME,B02001_001E&for=tract:*&in=state:04

test2016 <- getCensus(name="acs5", vintage=2016, vars="B02001_001E", region="tract:*", regionin="state:06")

Error in apiParse(req) : API response is not JSON
Error message: Error report HTTP Status 404 - /data/2016/acs5type Status reportmessage /data/2016/acs5description The requested resource is not available.

Handle APIs that don't allow a `for` geography argument

A few API endpoints do not allow using geography and break if you try to use one, including setting for=us:*.

The error message that these calls return is There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience.

Almost all API uses are for endpoints that require geography, so I don't think it should necessarily be removed as a required argument. Perhaps this could be handled internally - if the endpoint name is in this list, don't pass for. Maybe.

I will also raise with the Census team to see if this can be fixed, or at least given a more useful error message.

Error: Invalid 'for' argument

I am trying to download population data for OR based on block group geography using the following code:

data1990 <- getCensus(name="sf1", vintage=1990, key='baaaffb5ed3accd8dfa53c6f827659d43fcdfa21', vars="P0010001", region="block+group:*", regionin="state:41")

I get an "error: invalid 'for' argument" message. I'm assuming this has something to do with the fact that I am trying to 1990 Census data (which has given me problems in my other attempts to access), but am unsure. Are you able to access this data at block group level?

SAHIE data with getCensus() not generating all age/sex/race groups

sahie16 <- getCensus(name = "timeseries/healthins/sahie", vars = c("NAME", "AGE_DESC", "RACE_DESC", "SEX_DESC", "IPRCAT", "IPR_DESC", "PCTUI_PT", "PCTUI_LB90", "PCTUI_UB90", "PCTUI_MOE"), region = "county:*", regionin = "state:39", time = 2016)

This does not generate all RACE or AGE or SEX categories.

Instead, I get

table(sahie16$AGE_DESC)
Under 65 years
528

table(sahie16$RACE_DESC)
All Races
528

table(sahie16$SEX_DESC)
Both Sexes
528

What am I doing wrong in asking for specific variables but all categories?

Bug report: getCensus not pulling > 2015 vintages for components.

Describe the bug
I call the getCensus function on pep/components but it only returns through period 8 (2015) although the data is available through 2017 at least.

To Reproduce
pep_com <- getCensus(name="pep/components", # This is the Estimates datafile
vintage = 2017, # Vintage year is set to the variable set above
vars = "PERIOD", # gathering these variables
region="COUNTY", # at the geography of COUNTY
regionin="state:13" # within the state of Georgia
)

table(pep_com$PERIOD)
1 2 3 4 5 6 7 8
159 159 159 159 159 159 159 159

Expected behavior
Should return period 1-10, which is 2017.

R session information:

  • censusapi package version: 0.6.0
  • R version: "R version 3.6.0 (2019-04-26)"

Additional context
I contacted the Census and they said that it must be a bug in the package.

Ensure cleanly formatted column names are returned

In rare cases, data are returned with periods in the column names as a result of spaces in the json key. See for example (thanks to Andrew Tran):

fl_sd_pop <-  getCensus(name="acs5", 
                        vintage=2015,
                        vars=c("NAME", "B01003_001E"), 
                        region="school district (unified)",
                        regionin="state:12")

returns a column named fl_sd_pop$school.district..unified.

getCensus should clean up colnames by removing doubled and trailing periods/characters and turning the remaining periods/characters into underscores. The cleaned up column name in this example would be school_district_unified

Update Decennial census examples for upcoming API endpoint changes

The Census Bureau is renaming some of the Decennial Census API endpoints at the end of August (per email).

This package will have no problems retrieving data using the new names, however the examples should be updated to reflect the new names around that date. Decennial Census examples should be removed material that is in the next CRAN release in order to avoid confusion.

International Trade: Error in apiParse(req)

Just an FYI that I'm not a professional programmer/researcher and this is my first time using R to pull Census data.

I'm trying to workaround the fact that the Census API isn't designed for pulling full datasets. One of the macro teams at the DOE is interested in collecting and consolidating international trade data by HS and NAICS from the Census.

Linked is one kind of dataset I was trying to pull from: https://api.census.gov/data/timeseries/intltrade/exports/hs

I talked briefly to a Census supervisor for macro analysis and read through the API guide. Then, I tried to set up an API call in R using the censusapi pkg. Got the following:

Error in apiParse(req) : The Census Bureau returned the following error message: There was an error while running your query. We've logged the error and we'll correct it ASAP. Sorry for the inconvenience.

What I was trying to do was limit the API call to only the export parameters by string and a handful of int type parameters that fall under a certain category like air, shipping, etc. The Census supervisor advised that I merge calls and filter out the summary lines by including the following: SUMMARY_LVL2=HSCYCSDTRP, COMM_LVL=HS10, and SUMMARY_LVL=DET

I'm not sure if the issue is I'm calling too many variables at once or something else.

Below is what I'm doing based off the tutorial:

x<-c("censusapi", "data.table")
require(x)
lapply(x, require, character.only = TRUE)
apis <- listCensusApis()
View(apis)

example <- listCensusMetadata(name = "timeseries/intltrade/exports/hs", type = "variables")
head(example)

HS.Mon.Exports <- getCensus(name = "timeseries/intltrade/exports/hs",
vars = c("AIR_VAL_MO", "AIR_VAL_YR",	"AIR_WGT_MO", "AIR_WGT_YR", "SUMMARY_LVL=DET",
"QTY_1_YR_FLAG",	"DIST_NAME",	"YEAR",	"CTY_NAME",	"COMM_LVL=HS10", 
"E_COMMODITY_SDESC",	 "DF",	"MONTH", "CTY_CODE",	"LAST_UPDATE", "DISTRICT",
"E_COMMODITY_LDESC",	"QTY_1_MO_FLAG",	"SUMMARY_LVL2=HSCYCSDTRP",	"E_COMMODITY",
"UNIT_QY2", "UNIT_QY1",	"QTY_2_MO_FLAG", "QTY_2_YR_FLAG",
"E_COMMODITY","UNIT_QY1"), region = "us:*") 
head(HS.Mon.Exports)

Let me know if what I'm saying is confusing or unclear. Thank you for your time!

Install fails due to problem with DESCRIPTION file

Looks like when the e-mail address was removed from the DESCRIPTION, that causes an error during install tests. Relevant output is below:

* installing *source* package 'censusapi' ...
Error : Invalid DESCRIPTION file

Authors@R field gives no person with maintainer role, valid email
address and non-empty name.

See section 'The DESCRIPTION file' in the 'Writing R Extensions'
manual.

ERROR: installing package DESCRIPTION failed for package 'censusapi'

documentation and use cases

Hi Hannah -

Great wrapper functions for the Census API. These are very useful!

Have you started documentation on some of the common variables and geographies through example use cases, data dictionaries, or vignettes? Is this something that you could use help with?

Jesse

Issue Calling on Zip Code Tabulation Areas in a County

Hey! This is an amazing tool, but am running into a snag when I start looking up zip codes within a given county for ACS 2015, 5-Year Estimates. When I called on available geographies, it said zip code tabulation areas are one of them. As I began calling on state data and then Texas county data, everything populated correctly. As I called on zip codes in Bexar County, that's when errors started popping up. Specifically, the errors read, Error: error: unknown/unsupported geography heirarchy and I cannot figure out why. I've looked everywhere in the documentation, but can't seem to figure out why this error is generating. Also, in place of my API key, I have written key="REMOVED".

# County-Level Data
ed25TXcounty <- getCensus(name="acs5", vintage = "2015",
vars=c("NAME", "B15003_001E", "B15003_002E", "B15003_003E", "B15003_004E", "B15003_005E", "B15003_006E", "B15003_007E", "B15003_008E", "B15003_009E", "B15003_010E", "B15003_011E", "B15003_012E", "B15003_013E", "B15003_014E", "B15003_015E", "B15003_016E", "B15003_017E", "B15003_018E", "B15003_019E", "B15003_020E", "B15003_021E", "B15003_022E", "B15003_023E", "B15003_024E", "B15003_025E"), 
region="county:*", regionin="state:48",
key="REMOVED")
# Bexar County Data
ed25Bexarcounty <- getCensus(name="acs5", vintage = "2015",
vars=c("NAME", "B15003_001E", "B15003_002E", "B15003_003E", "B15003_004E", "B15003_005E", "B15003_006E", "B15003_007E", "B15003_008E", "B15003_009E", "B15003_010E", "B15003_011E", "B15003_012E", "B15003_013E", "B15003_014E", "B15003_015E", "B15003_016E", "B15003_017E", "B15003_018E", "B15003_019E", "B15003_020E", "B15003_021E", "B15003_022E", "B15003_023E", "B15003_024E", "B15003_025E"), 
region="zip code tabulation area:*", regionin="state:48+county:029", key="REMOVED")

Upcoming Census API changes

Hi, I got an email from the Census today. It seems like there are some pretty major changes coming to how to query their API. The most relevant part of the email was:

Please note that beginning with the release of the 2016 ACS 1-Year estimates on September 14, 2017, the data will only be available in the new format. We will continue to maintain the 2015 1-Year estimates in the original format until October 2, 2017.

We will let you know when the 2015 ACS 5-Year estimates will be available in the new format. Eventually the remainder of the ACS products in the API will be converted to the new format and we will update you when we have a timeline for those releases. Please contact us via email at [email protected] with any questions or concerns you might have.

There was a linked guide to the changes:
https://www.census.gov/content/dam/Census/data/developers/acs/acs-data-variables-guide.pdf

Does this mean there will have to be a rewriting of the functions in this package?

Thanks for all your hard work with censusapi, I use it on a daily basis.

Add groups metadata type to listCensusMetadata

For the new 2010 dec/sf1 API endpoint, group codes are listed in the group metadata field. Concept labels are no longer used. Group labels (formerly concept labels) can be found in the groups metadata file, e.g. https://api.census.gov/data/2010/dec/sf1/groups.json
A listCensusMetadata(type = "groups") option could be added without much work to make this data more easily discoverable, particularly useful for the new 2010 Decennial and ACS formats.

Caveat: for most API endpoints the groups metadata currently contains no content.

Bug report: error getting ACS county-to-county migration flow data for all states

I am trying to get ACS county-to-county migration flow data using censusapi following your example here

I have successfully downloaded ACS migration flow data for 2016 using the following code:

vars <-c("MOVEDIN", "MOVEDOUT", "FULL1_NAME", "FULL2_NAME", "GEOID2") migr16 <- getCensus(name = "acs/flows", vintage = 2016, vars = vars, region = "county“)
However, this does not work for other years than 2016. For 2015 it only works if I add "regionin" for one state:

migr15 <- getCensus(name = "acs/flows", vintage = 2015, vars = vars, region = "county", regionin = "state:01“)
As I want to download data for all counties I tried looping it using map_dfr as suggested here:

mystates <- c(paste0("state:", str_pad(1:51, 2, pad="0"))) migr15 <- map_dfr(mystates, ~ getCensus(name = "acs/flows", vintage = 2015, vars = vars, region = "county", regionin = .x) )
I also tried other options as suggested in the post. None of them worked when using censusapi. I would highly appreciate your help!

Censusapi package version: 0.6.0
R version: 3.5.1

Add ability to include miscellaneous parameters

Thanks for your great package!
Would it be possible to add miscellaneous parameters to API requests? The county business patterns requires additional comments in order to report data by employment size. See example below:

  • No Employment Sizes
    https://api.census.gov/data/2008/cbp?get=YEAR,EMPSZES_TTL,EMP,ESTAB,PAYANN,GEO_TTL&for=state:01

  • Adding "&EMPSZES=*" Provides Employment Sizes
    https://api.census.gov/data/2008/cbp?get=YEAR,EMPSZES_TTL,EMP,ESTAB,PAYANN,GEO_TTL&for=state:01&EMPSZES=*

Error in listCensusMetadata

I'm having an issue using the listlistCensusMetadata command when trying to access ACS data tables. The command works fine when using SAHIE or SAIPE data sets. I get the same error regardless of ACS1 or ACS5 and regardless of what vintage use (e.g. 2012 or 2015). Also I have the most current version of censusapi.

acs_var <- listCensusMetadata(name = "acs/acs5", vintage = 2016, type = "variables")
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

Getting acs subject tables

Does censusapi support getting acs subject tables? For example, I have a client who would like to import all data in table S1902 for the state of Michigan into R. You can see the data in American Fact Finder here: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_16_5YR_S2703&prodType=table.

I wasn't sure if censusapi supports importing subject tables. I wrote this code, attempting to import all the data in the above AFF page into R:

getCensus(name = "acs/acs5",
          vintage = 2016, 
          vars = "S1902", 
          region = "state:26")

I got this error

Error in apiCheck(req) : 
  The Census Bureau returned the following error message:
 error: error: unknown variable 'S1902'

Any advice would be appreciated.

versioning

In the next release, use proper package versioning.

Examples for every API

The most frequent questions and issues I receive about this package relate to how to use a specific API. To that end, I've created a new list of examples vignette.

It's very bare bones right now - 1 to 3 examples from most of the APIs with little text. Suggestions/PRs for how to improve this page are welcome!

In addition, if you have projects you've done using censusapi with code online, a usage example page with links would be cool for the future.

error: unknown/unsupported geography heirarchy when querying data for all ZCTA5s

Hey there, the following code...

getCensus(
     name = "sf1",
     vars = c("P0010001"),
     region = "zip code tabulation area:*",
     vintage = 2010,
     key = Sys.getenv("CENSUS_KEY")
 )

...throws this error...
Error: error: unknown/unsupported geography heirarchy.

Any idea what I need to fix here so I can get 2010 population data for every ZCTA5? Replacing "zip code tabulation area:*" with `"county:*" works fine, by the way...

Error 204 with Population Estimates

I'm trying to pull in data on places using the pep/population API call. I have updated censusapi to the latest version but the error message doesn't display much info. I'm not actually sure if this is an error with censusapi or something on the Census side of things.

df.pop_cities2 <- getCensus(name = "pep/population",
    vintage = 2018,
    vars = c("GEONAME", "DATE_CODE", "DATE_DESC", "POP"),
    region = "place:*",
    regionin = "state:46")

And here is the error code I get

Error: 204, no content was returned. See ?listCensusMetadata to learn more about valid API options.

I tried it with multiple states and get the same error. Any thoughts? Am I doing this wrong somehow? Thanks.

Refine numeric column handling

Detecting numeric columns is messy: the APIs return all data as strings. getCensus currently types columns with numbers in the col name as numeric. This works well for the ACS and decennial Census APIs but needs improvement for several reasons:

  • a few string variables contain numbers in their name (e.g. fage4 in the timeseries/bds/firms API) so the current approach coerces them to NA
  • some numeric variables have letter-only variable names (e.g. in the the SAHIE API)

Theoretically, it should be possible to use the variable API endpoints to get type information (the approach of Python lib census https://github.com/datamade/census) but that field doesn't uniformly exist and is often wrong. I've raised this issue with the Census dev team and hope they'll correctly type data in the future.

Until then, this package needs typing improvement particularly for the timeseries APIs.

pep apis connection error

For most of the apis I can get the list of variables, geography, etc. However, for "pep" apis I get a connection error.

listCensusMetadata(name="pep/subcty", type = "variables")
Error in open.connection(con, "rb") : HTTP error 404

Is there a way of fixing this?

Thank you!

New 2010 hierarchy for blocks breaks getCensus example

I'm not sure if this is related to the API changes already covered in #42 but I just noticed that the required hierarchy for blocks from the 2010 sf1 is now state --> county --> tract --> block, rather than the previous state --> county --> block. As a result, the example in the getCensus docs showing the use of that second hierarchy for 2010 sf1 no longer works; it has to be structured like the example given for the 2000 sf1.

library(censusapi)

# example from getCensus
data_no_tract <- getCensus(name = "dec/sf1", vintage = 2010,
                           vars = c("P001001", "H010001"),
                           region = "block:*", regionin = "state:36+county:027")
#> Error in apiCheck(req): The Census Bureau returned the following error message:
#>  error: unknown/unsupported geography heirarchy

data_with_tract <- getCensus(name = "dec/sf1", vintage = 2010,
                             vars = c("P001001", "H010001"),
                             region = "block:*", regionin = "state:36+county:027+tract:010000")
head(data_with_tract)
#>   state county  tract block P001001 H010001
#> 1    36    027 010000  1000      31      31
#> 2    36    027 010000  1011      17      17
#> 3    36    027 010000  1028      41      41
#> 4    36    027 010000  1001       0       0
#> 5    36    027 010000  1031       0       0
#> 6    36    027 010000  1002       4       4

Appreciate this package!

Add more robust error messaging for API key fails

GetCensus function fails behind work proxies - error messages should make clearer what is happening and thus make it easier to troubleshoot and fix.
Returning:
"Error: lexical error: invalid char in json text." - it's really a proxy issue though.

Add examples in help files for finer-scale data (e.g., block)

This is a great package - thanks for your great work on it!

Particularly for folks new to the US Census API it would be helpful to have some more examples - in particular, for getting data at as fine as tract or block level. I've included an example below for Census block in case you'd like to include, and could also issue pull request if you prefer - just let me know.

data2010 <- getCensus(name="sf1", 
vintage=2010,
key=censuskey, 
vars=c('PLACE','P0010001', 'P0030001', 'BLKGRP'), 
region="block:*", regionin='state:36+county:27')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.