ramhiser / noncensus Goto Github PK
View Code? Open in Web Editor NEWU.S. Census Region and Demographic Data
License: Other
U.S. Census Region and Demographic Data
License: Other
I see that much of this package hasn't been changed for several months and yet it is still ahead of CRAN. Is this development version still under development or is it intended to be updated on CRAN anytime soon?
The U.S. Census Bureau has divided the country into collections of counties having a concentration of population. Details are given on Wikipedia.
Data at census.gov.
As discussed in this PDF from the U.S. Census Bureau, the U.S. can be thought of as a hierarchy of regions. These include:
This is a catch-all issue intended to capture possible features.
The data for every U.S. can be obtained directly from this text file. The U.S. Census Bureau provides a description here.
shiny_choro
example(shiny_choro)
and explore_counties()
shiny_choro
to a standalone data setFor a reference, see: http://www.census.gov/geo/reference/zctas.html
Add in argument to let people merge on state abbrev, iso codes or un_codes
Currently, the Shiny app launched by shiny_choro
is too slow when Shiny is initialized. After the app is loaded, the speed of the app is reasonable. This suggests that the bottleneck is either in shiny_choro()
or in the Shiny app's global.r
file.
Example to replicate behavior:
library(noncensus)
example(shiny_choro)
The BLS moves counties between metro areas in some years, but it seems that the data set here is just a snapshot of the county assignments at some unspecified year. If that's correct, I think you should document which year it is or extend the data by adding a year column. In the latter case, I think the package would be very useful to a lot of folks working with Census data. Thanks!
Hadley Wickham provided a ggplot2 solution to Revolution's Choropleth Map R Challenge. An SO post also provides updated code. Hadley's solution determines the county boundaries in terms of latitude and longitude.
In general, having county boundaries would make plotting with ggplot2
easier. Boundaries are given in ggplot2::map_data("county")
. However, map_data
does not include a FIPS code, so the mapping is a bit challenging.
For instance, this almost works:
counties$county <- tolower(gsub(" County$", "", counties$county_name))
county_boundaries <- inner_join(counties, county_df, by="county")
head(county_boundaries)
county_name state.x state_fips county_fips fips_class CSA CBSA population county lat group
1 Autauga County AL 01 001 H1 <NA> 33860 54571 autauga 32.34920 1
2 Autauga County AL 01 001 H1 <NA> 33860 54571 autauga 32.35493 1
3 Autauga County AL 01 001 H1 <NA> 33860 54571 autauga 32.36639 1
4 Autauga County AL 01 001 H1 <NA> 33860 54571 autauga 32.37785 1
5 Autauga County AL 01 001 H1 <NA> 33860 54571 autauga 32.38357 1
6 Autauga County AL 01 001 H1 <NA> 33860 54571 autauga 32.37785 1
state.y long order
1 AL -86.50517 1
2 AL -86.53382 2
3 AL -86.54527 3
4 AL -86.55673 4
5 AL -86.57966 5
6 AL -86.59111 6
However, not every county is mapped cleanly:
# Should be nrow(counties) - # of county equivalents
> nlevels(factor(county_boundaries$county))
[1] 1716
# Should be 50
> nlevels(factor(county_boundaries$state.x))
[1] 47
Currently, noncensus
munges much of the data provided in raw form from census.gov. The Census Bureau recently introduced a data API that requires an API key.
Functions should be written to query the API and download the data in the appropriate format. These data can then be transformed into a standard format (e.g., CSV, TopoJSON, GeoJSON) for consumption.
and other data as found.
After the Shiny app is launched via shiny_choro
, the rendering after a dropdown has been selected needs to be sped up. Currently, this delay can take 2-3 seconds and should be closer to instantaneous. It's possible that the bottleneck is the leaflet
package, but it's unclear at the moment.
Example to replicate behavior:
library(noncensus)
example(shiny_choro)
Zip codes needed for counties and MSAs
When running the example in shiny_choro
, the following error is thrown:
Error in (1 - h) * qs[i] : non-numeric argument to binary operator
After a bit of debugging, I traced the problem to the line that reads:
cuts <- unique(quantile(df$fill, seq(0, 1, 1/5)))
Notice that df$fill
is a character vector:
Browse[2]> head(df$fill)
[1] "0 - 4" "10 - 14" "15 - 19" "20 - 24" "25 - 29" "30 - 34"
The vignette should list and briefly describe each data set within the package.
Quote from an email from Dan Irons:
In the “states” dataset, state populations appear to have been misassigned. For example, the populations of Alaska and Alabama have been transposed, likewise the populations of Arkansas and Alabama, and so on with each pair of states on down the list (although I haven’t checked every row). I haven’t looked at the package source to determine where the errors are introduced, and of course it is entirely possible that the errors were inherited from the data sources that you took the data from.
Based on latest commit in shiny_map
branch...
The population_age
data frame is not visible within the example after it's loaded. Wrong environment?
> library(noncensus)
> example(shiny_choro)
shny_c> data(population_age, package="noncensus")
shny_c> shiny_choro(population_age, fill = "age_group", categories = "population",
shny_c+ palette = "Purples", background = "Grey")
Loading required package: shiny
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: leaflet
Error in match(x, table, nomatch = 0L) (from Rex5c4a2853a311#8) :
object 'population_age' not found
>
data/population_age.rda
to /inst/data_scripts/
data/quick_facts.rds
to /inst/data_scripts/
data/quick_facts.rds
to *.rda/*.RData
to be loaded by data()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.