idblr / ndi Goto Github PK

Compute various geospatial neighborhood deprivation indices

License: Other

R 97.95% Rez 2.05%

census census-api census-data deprivation deprivation-stats geospatial geospatial-data principal-component-analysis r r-package

ndi's Introduction

ndi: Neighborhood Deprivation Indices

Date repository last updated: July 06, 2024

Overview

The ndi package is a suite of R functions to compute various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered 'spatial' because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are 'aspatial' because they only consider the value within each census geography. Two types of aspatial NDI are available: (1) based on Messer et al. (2006) and (2) based on Andrews et al. (2020) and Slotman et al. (2022) who use variables chosen by Roux and Mair (2010). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward) pulled by the tidycensus package. Using data from the ACS-5 (2005-2009 onward), the ndi package can also compute the (1) spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011), (2) spatial Educational Isolation Index (EI) based on Bravo et al. (2021), (3) aspatial Index of Concentration at the Extremes (ICE) based on Feldman et al. (2015) and Krieger et al. (2016), (4) aspatial racial/ethnic Dissimilarity Index (DI) based on Duncan & Duncan (1955), (5) aspatial income or racial/ethnic Atkinson Index (DI) based on Atkinson (1970), (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954), (7) aspatial racial/ethnic Correlation Ratio based on Bell (1954) and White (1986), (8) aspatial racial/ethnic Location Quotient based on Merton (1939) and Sudano et al. (2013), (9) aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017), and (10) aspatial racial/ethnic Delta based on Hoover (1941) and Duncan et al. (1961; LC:60007089). Also using data from the ACS-5 (2005-2009 onward), the ndi package can retrieve the aspatial Gini Index based on Gini (1921).

Installation

To install the release version from CRAN:

install.packages('ndi')

To install the development version from GitHub:

devtools::install_github('idblr/ndi')

Available functions

Function	Description
`anthopolos`	Compute the spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011)
`atkinson`	Compute the aspatial Atkinson Index (AI) based on Atkinson (1970)
`bell`	Compute the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954)
`bemanian_beyer`	Compute the aspatial racial/ethnic Local Exposure and Isolation (LEx/Is) metric based on Bemanian & Beyer (2017)
`bravo`	Compute the spatial Educational Isolation Index (EI) based on Bravo et al. (2021)
`duncan`	Compute the aspatial racial/ethnic Dissimilarity Index (DI) based on Duncan & Duncan (1955)
`gini`	Retrieve the aspatial Gini Index based on Gini (1921)
`hoover`	Compute the aspatial racial/ethnic Delta (DEL) based on Hoover (1941) and Duncan et al. (1961; LC:60007089).
`krieger`	Compute the aspatial Index of Concentration at the Extremes (ICE) based on Feldman et al. (2015) and Krieger et al. (2016)
`messer`	Compute the aspatial Neighborhood Deprivation Index (NDI) based on Messer et al. (2006)
`powell_wiley`	Compute the aspatial Neighborhood Deprivation Index (NDI) based on Andrews et al. (2020) and Slotman et al. (2022) with variables chosen by Roux and Mair (2010)
`sudano`	Compute the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1938) and Sudano et al. (2013)
`white`	Compute the aspatial racial/ethnic Correlation Ratio (V) based on Bell (1954) and White (1986)

The repository also includes the code to create the project hexagon sticker.

Available sample dataset

Data	Description
`DCtracts2020`	A sample data set containing information about U.S. Census American Community Survey 5-year estimate data for the District of Columbia census tracts (2020). The data are obtained from the tidycensus package and formatted for the `messer()` and `powell_wiley()` functions input.

Author

Ian D. Buller - Social & Scientific Systems, Inc., a DLH Corporation Holding Company, Bethesda, Maryland (current) - Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original) - GitHub - ORCID

See also the list of contributors who participated in this package, including:

Jacob Englert - Biostatistics and Bioinformatics Doctoral Program, Laney Graduate School, Emory University, Atlanta, Georgia - GitHub
Jessica Gleason - Epidemiology Branch, Division of Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland - ORCID
Chris Prener - Real World Evidence Center of Excellence, Pfizer, Inc. - GitHub - ORCID
Davis Vaughan - Posit - GitHub - ORCID

Thank you to those who suggested additional metrics, including:

David Berrigan - Behavioral Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Rockville, Maryland - ORCID
Symielle Gaston - Social and Environmental Determinants of Health Equity Group, Epidemiology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina - ORCID
Jessica Madrigal - Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland - ORCID

Getting Started

Step 1: Obtain a unique access key from the U.S. Census Bureau. Follow this link to obtain one.
Step 2: Specify your access key in the anthopolos(), atkinson(), bell(), bemanian_beyer(), bravo(), duncan(), gini(), hoover(), krieger(), messer(), powell_wiley(), sudano(), or white() functions using the internal key argument or by using the census_api_key() function from the tidycensus package before running the anthopolos(), atkinson(), bell(), bemanian_beyer(), bravo(), duncan(), gini(), hoover(), krieger(), messer(), powell_wiley(), sudano(), or white() functions (see an example below).

Usage

# ------------------ #
# Necessary packages #
# ------------------ #

library(ndi)
library(ggplot2)
library(sf) # dependency fo the 'ndi' package
library(tidycensus) # a dependency for the 'ndi' package
library(tigris)

# -------- #
# Settings #
# -------- #

## Access Key for census data download
### Obtain one at http://api.census.gov/data/key_signup.html
census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API

# ---------------------- #
# Calculate NDI (Messer) #
# ---------------------- #

# Compute the NDI (Messer) values (2016-2020 5-year ACS) for Washington, D.C. census tracts
messer2020DC <- messer(state = 'DC', year = 2020)

# ------------------------------ #
# Outputs from messer() function #
# ------------------------------ #

# A tibble containing the identification, geographic name, NDI (Messer) values, NDI (Messer) 
# quartiles, and raw census characteristics for each tract
messer2020DC$ndi

# The results from the principal component analysis used to compute the NDI (Messer) values
messer2020DC$pca

# A tibble containing a breakdown of the missingingness of the census characteristics 
# used to compute the NDI (Messer) values
messer2020DC$missing

# -------------------------------------- #
# Visualize the messer() function output #
# -------------------------------------- #

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the NDI (Messer) values to the census tract geometry
DC2020messer <- tract2020DC %>%
  left_join(messer2020DC$ndi, by = 'GEOID')

# Visualize the NDI (Messer) values (2016-2020 5-year ACS) for Washington, D.C. census tracts

## Continuous Index
ggplot() +
  geom_sf(
    data = DC2020messer,
    aes(fill = NDI),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Neighborhood Deprivation Index\nContinuous (Messer, non-imputed)',
    subtitle = 'Washington, D.C. tracts as the referent'
  )

## Categorical Index (Quartiles)
### Rename '9-NDI not avail' level as NA for plotting
DC2020messer$NDIQuartNA <-
  factor(
    replace(
      as.character(DC2020messer$NDIQuart),
      DC2020messer$NDIQuart == '9-NDI not avail',
      NA
    ),
    c(levels(DC2020messer$NDIQuart)[-5], NA)
  )

ggplot() +
  geom_sf(
    data = DC2020messer,
    aes(fill = NDIQuartNA),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_d(
    guide = guide_legend(reverse = TRUE),
    na.value = 'grey50'
  ) +
  labs(
    fill = 'Index (Categorical)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Neighborhood Deprivation Index\nQuartiles (Messer, non-imputed)',
    subtitle = 'Washington, D.C. tracts as the referent'
  )

# ---------------------------- #
# Calculate NDI (Powell-Wiley) #
# ---------------------------- #

# Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for
# Washington, D.C. census tracts
powell_wiley2020DC <- powell_wiley(state = 'DC', year = 2020)
# impute missing values
powell_wiley2020DCi <- powell_wiley(state = 'DC', year = 2020, imp = TRUE)

# ------------------------------------ #
# Outputs from powell_wiley() function #
# ------------------------------------ #

# A tibble containing the identification, geographic name, NDI (Powell-Wiley) value, and 
# raw census characteristics for each tract
powell_wiley2020DC$ndi

# The results from the principal component analysis used to 
# compute the NDI (Powell-Wiley) values
powell_wiley2020DC$pca

# A tibble containing a breakdown of the missingingness of the census characteristics used to 
# compute the NDI (Powell-Wiley) values
powell_wiley2020DC$missing

# -------------------------------------------- #
# Visualize the powell_wiley() function output #
# -------------------------------------------- #

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the NDI (powell_wiley) values to the census tract geometry
DC2020powell_wiley <- tract2020DC
  left_join(powell_wiley2020DC$ndi, by = 'GEOID')
DC2020powell_wiley <- DC2020powell_wiley
  left_join(powell_wiley2020DCi$ndi, by = 'GEOID')

# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for
# Washington, D.C. census tracts

## Non-imputed missing tracts (Continuous)
ggplot() +
  geom_sf(
    data = DC2020powell_wiley,
    aes(fill = NDI.x),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Neighborhood Deprivation Index\nContinuous (Powell-Wiley, non-imputed)',
    subtitle = 'Washington, D.C. tracts as the referent'
  )

## Non-imputed missing tracts (Categorical quintiles)
### Rename '9-NDI not avail' level as NA for plotting
DC2020powell_wiley$NDIQuintNA.x <- factor(
  replace(
    as.character(DC2020powell_wiley$NDIQuint.x),
    DC2020powell_wiley$NDIQuint.x == '9-NDI not avail',
    NA
  ),
  c(levels(DC2020powell_wiley$NDIQuint.x)[-6], NA)
)
  

ggplot() +
  geom_sf(
    data = DC2020powell_wiley,
    aes(fill = NDIQuintNA.x),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_d(
    guide = guide_legend(reverse = TRUE),
    na.value = 'grey50'
  ) +
  labs(
    fill = 'Index (Categorical)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Neighborhood Deprivation Index\n
    Population-weighted Quintiles (Powell-Wiley, non-imputed)',
    subtitle = 'Washington, D.C. tracts as the referent'
  )

## Imputed missing tracts (Continuous)
ggplot() +
  geom_sf(
    data = DC2020powell_wiley,
    aes(fill = NDI.y),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Neighborhood Deprivation Index\nContinuous (Powell-Wiley, imputed)',
    subtitle = 'Washington, D.C. tracts as the referent'
  )

## Imputed missing tracts (Categorical quintiles)
### Rename '9-NDI not avail' level as NA for plotting
DC2020powell_wiley$NDIQuintNA.y <- factor(
  replace(
    as.character(DC2020powell_wiley$NDIQuint.y),
    DC2020powell_wiley$NDIQuint.y == '9-NDI not avail',
    NA
  ),
  c(levels(DC2020powell_wiley$NDIQuint.y)[-6], NA)
)
  
ggplot() +
  geom_sf(
    data = DC2020powell_wiley,
    aes(fill = NDIQuintNA.y),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_d(
    guide = guide_legend(reverse = TRUE),
    na.value = 'grey50'
  ) +
  labs(
    fill = 'Index (Categorical)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Neighborhood Deprivation Index\nPopulation-weighted Quintiles (Powell-Wiley, imputed)',
    subtitle = 'Washington, D.C. tracts as the referent'
  )

# --------------------------- #
# Compare the two NDI metrics #
# --------------------------- #

# Merge the two NDI metrics (Messer and Powell-Wiley, imputed)
ndi2020DC <- messer2020DC$ndi %>%
  left_join(
    powell_wiley2020DCi$ndi,
    by = 'GEOID',
    suffix = c('.messer', '.powell_wiley')
  )

# Check the correlation of two NDI metrics (Messer & Powell-Wiley, imputed) as continuous values
cor(ndi2020DC$NDI.messer, ndi2020DC$NDI.powell_wiley, use = 'complete.obs') # Pearson's r=0.975

# Check the similarity of the two NDI metrics (Messer and Powell-Wiley, imputed) as quartiles
table(ndi2020DC$NDIQuart, ndi2020DC$NDIQuint)

# ---------------------------- #
# Retrieve aspatial Gini Index #
# ---------------------------- #

# Gini Index based on Gini (1921) from the ACS-5
gini2020DC <- gini(state = 'DC', year = 2020)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the Gini Index values to the census tract geometry
gini2020DC <- tract2020DC %>%
  left_join(gini2020DC$gini, by = 'GEOID')

ggplot() +
  geom_sf(
    data = gini2020DC,
    aes(fill = gini),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Gini Index\nGrey color denotes no data',
    subtitle = 'Washington, D.C. tracts'
  )

# ---------------------------------------------------- #
# Compute spatial Racial Isoliation Index (Anthopolos) #
# ---------------------------------------------------- #

# Racial Isolation Index based on Anthopolos et al. (2011)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
ri2020DC <- anthopolos(state = 'DC', year = 2020, subgroup = 'NHoLB')

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the RI (Anthopolos) values to the census tract geometry
ri2020DC <- tract2020DC %>%
  left_join(ri2020DC$ri, by = 'GEOID')

ggplot() +
  geom_sf(
    data = ri2020DC,
    aes(fill = RI),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Racial Isolation Index\n
    Not Hispanic or Latino, Black or African American alone (Anthopolos)',
    subtitle = 'Washington, D.C. tracts (not corrected for edge effects)'
  )

# ---------------------------------------------------- #
# Compute spatial Educational Isoliation Index (Bravo) #
# ---------------------------------------------------- #

# Educational Isolation Index based on Bravo et al. (2021)
## Selected subgroup: without four-year college degree
ei2020DC <- bravo(state = 'DC', year = 2020, subgroup = c('LtHS', 'HSGiE', 'SCoAD'))

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the EI (Bravo) values to the census tract geometry
ei2020DC <- tract2020DC %>% 
  left_join(ei2020DC$ei, by = 'GEOID')

ggplot() + 
  geom_sf(
    data = ei2020DC, 
    aes(fill = EI),
    color = 'white'
  ) +
  theme_bw() + 
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  )+
  ggtitle(
    'Educational Isolation Index\nWithout a four-year college degree (Bravo)',
    subtitle = 'Washington, D.C. tracts (not corrected for edge effects)'
  )

# ----------------------------------------------------------------- #
# Compute aspatial Index of Concentration at the Extremes (Krieger) #
# ----------------------------------------------------------------- #

# Five Indices of Concentration at the Extremes based on Feldman et al. (2015) and 
# Krieger et al. (2016)

ice2020DC <- krieger(state = 'DC', year = 2020)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the ICEs (Krieger) values to the census tract geometry
ice2020DC <- tract2020DC %>%
  left_join(ice2020DC$ice, by = 'GEOID')

# Plot ICE for Income
ggplot() +
  geom_sf(
    data = ice2020DC,
    aes(fill = ICE_inc),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Index of Concentration at the Extremes\nIncome (Krieger)',
    subtitle = '80th income percentile vs. 20th income percentile'
  )

# Plot ICE for Education
ggplot() +
  geom_sf(
    data = ice2020DC,
    aes(fill = ICE_edu),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Index of Concentration at the Extremes\nEducation (Krieger)',
    subtitle = 'less than high school vs. four-year college degree or more'
  )

# Plot ICE for Race/Ethnicity
ggplot() +
  geom_sf(
    data = ice2020DC,
    aes(fill = ICE_rewb),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)',
    subtitle = 'white non-Hispanic vs. black non-Hispanic'
  )

# Plot ICE for Income and Race/Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. 
## black (including Hispanic) in 20th income percentile
ggplot() +
  geom_sf(
    data = ice2020DC,
    aes(fill = ICE_wbinc),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Index of Concentration at the Extremes\nIncome and race/ethnicity combined (Krieger)',
    subtitle = 'white non-Hispanic in 80th income percentile vs. 
    black (incl. Hispanic) in 20th inc. percentile'
  )

# Plot ICE for Income and Race/Ethnicity Combined
## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
ggplot() +
  geom_sf(
    data = ice2020DC,
    aes(fill = ICE_wpcinc),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_gradient2(
    low = '#998ec3',
    mid = '#f7f7f7',
    high = '#f1a340',
    limits = c(-1, 1)
  ) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Index of Concentration at the Extremes\nIncome and race/ethnicity combined (Krieger)',
    subtitle = 'white non-Hispanic in 80th income percentile vs. 
    white non-Hispanic in 20th income percentile'
  )

# -------------------------------------------------------------------- #
# Compute aspatial racial/ethnic Dissimilarity Index (Duncan & Duncan) #
# -------------------------------------------------------------------- #

# Dissimilarity Index based on Duncan & Duncan (1955)
## Selected subgroup comparison: Not Hispanic or Latino, Black or African American alone
## Selected subgroup reference: Not Hispanic or Latino, white alone
## Selected large geography: census tract
## Selected small geography: census block group
di2020DC <- duncan(
  geo_large = 'tract',
  geo_small = 'block group',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB',
  subgroup_ref = 'NHoLW'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the DI (Duncan & Duncan) values to the census tract geometry
di2020DC <- tract2020DC %>%
  left_join(di2020DC$di, by = 'GEOID')

ggplot() +
  geom_sf(
    data = di2020DC,
    aes(fill = DI),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Dissimilarity Index (Duncan & Duncan)\nWashington, D.C. census block groups to tracts',
    subtitle = 'Black non-Hispanic vs. white non-Hispanic'
  )

# -------------------------------------------------------- #
# Compute aspatial racial/ethnic Atkinson Index (Atkinson) #
# -------------------------------------------------------- #

# Atkinson Index based on Atkinson (1970)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
## Selected large geography: census tract
## Selected small geography: census block group
## Default epsilon (0.5 or over- and under-representation contribute equally)
ai2020DC <- atkinson(
  geo_large = 'tract',
  geo_small = 'block group',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the AI (Atkinson) values to the census tract geometry
ai2020DC <- tract2020DC %>%
  left_join(ai2020DC$ai, by = 'GEOID')

ggplot() +
  geom_sf(
    data = ai2020DC,
    aes(fill = AI),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Atkinson Index (Atkinson)\nWashington, D.C. census block groups to tracts',
    subtitle = expression(paste('Black non-Hispanic (', epsilon, ' = 0.5)'))
  )

# ----------------------------------------------------- #
# Compute aspatial racial/ethnic Isolation Index (Bell) #
# ----------------------------------------------------- #

# Isolation Index based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
## Selected interaction subgroup: Not Hispanic or Latino, Black or African American alone
## Selected large geography: census tract
## Selected small geography: census block group
ii2020DC <- bell(
  geo_large = 'tract',
  geo_small = 'block group',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB',
  subgroup_ixn = 'NHoLW'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the II (Bell) values to the census tract geometry
ii2020DC <- tract2020DC %>%
  left_join(ii2020DC$ii, by = 'GEOID')

ggplot() +
  geom_sf(
    data = ii2020DC,
    aes(fill = II),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Isolation Index (Bell)\nWashington, D.C. census block groups to tracts',
    subtitle = 'Black non-Hispanic vs. white non-Hispanic'
  )

# -------------------------------------------------------- #
# Compute aspatial racial/ethnic Correlation Ratio (White) #
# -------------------------------------------------------- #

# Correlation Ratio based on Bell (1954) and White (1986)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
## Selected large geography: census tract
## Selected small geography: census block group
v2020DC <- white(
  geo_large = 'tract',
  geo_small = 'block group',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the V (White) values to the census tract geometry
v2020DC <- tract2020DC %>%
  left_join(v2020DC$v, by = 'GEOID')

ggplot() +
  geom_sf(
    data = v2020DC,
    aes(fill = V),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Correlation Ratio (White)\nWashington, D.C. census block groups to tracts',
    subtitle = 'Black non-Hispanic'
  )

# --------------------------------------------------------- #
# Compute aspatial racial/ethnic Location Quotient (Sudano) #
# --------------------------------------------------------- #

# Location Quotient based on Merton (1938) and Sudano (2013)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
## Selected large geography: state
## Selected small geography: census tract
lq2020DC <- sudano(
  geo_large = 'state',
  geo_small = 'tract',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the LQ (Sudano) values to the census tract geometry
lq2020DC <- tract2020DC %>%
  left_join(lq2020DC$lq, by = 'GEOID')

ggplot() +
  geom_sf(
    data = lq2020DC,
    aes(fill = LQ),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Location Quotient (Sudano)\nWashington, D.C. census tracts vs. 'state'',
    subtitle = 'Black non-Hispanic'
  )

# ------------------------------------------------------------------------------------- #
# Compute aspatial racial/ethnic Local Exposure and Isolation (Bemanian & Beyer) metric #
# ------------------------------------------------------------------------------------- #

# Local Exposure and Isolation metric based on Bemanian & Beyer (2017)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
## Selected interaction subgroup: Not Hispanic or Latino, Black or African American alone
## Selected large geography: state
## Selected small geography: census tract
lexis2020DC <- bemanian_beyer(
  geo_large = 'state',
  geo_small = 'tract',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB',
  subgroup_ixn = 'NHoLW'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the LEx/Is (Bemanian & Beyer) values to the census tract geometry
lexis2020DC <- tract2020DC %>%
  left_join(lexis2020DC$lexis, by = 'GEOID')

ggplot() +
  geom_sf(
    data = lexis2020DC,
    aes(fill = LExIs),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c() +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Local Exposure and Isolation (Bemanian & Beyer) metric\n
    Washington, D.C. census block groups to tracts',
    subtitle = 'Black non-Hispanic vs. white non-Hispanic'
  )

# --------------------------------------------- #
# Compute aspatial racial/ethnic Delta (Hoover) #
# --------------------------------------------- #

# Delta based on Hoover (1941) and Duncan et al. (1961)
## Selected subgroup: Not Hispanic or Latino, Black or African American alone
## Selected large geography: census tract
## Selected small geography: census block group
del2020DC <- hoover(
  geo_large = 'tract',
  geo_small = 'block group',
  state = 'DC',
  year = 2020,
  subgroup = 'NHoLB'
)

# Obtain the 2020 census tracts from the 'tigris' package
tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)

# Join the DEL (Hoover) values to the census tract geometry
del2020DC <- tract2020DC %>% 
  left_join(del2020DC$del, by = 'GEOID')

ggplot() +
  geom_sf(
    data = del2020DC,
    aes(fill = DEL),
    color = 'white'
  ) +
  theme_bw() +
  scale_fill_viridis_c(limits = c(0, 1)) +
  labs(
    fill = 'Index (Continuous)',
    caption = 'Source: U.S. Census ACS 2016-2020 estimates'
  ) +
  ggtitle(
    'Delta (Hoover)\nWashington, D.C. census block groups to tracts',
    subtitle = 'Black non-Hispanic'
  )

Funding

This package was originally developed while the author was a postdoctoral fellow supported by the Cancer Prevention Fellowship Program at the National Cancer Institute. Any modifications since December 05, 2022 were made while the author was an employee of Social & Scientific Systems, Inc., a DLH Corporation Holding Company.

Acknowledgments

The messer() function functionalizes the code found in Hruska et al. (2022) available on an OSF repository, but with percent with income less than $30K added to the computation based on Messer et al. (2006). The messer() function also allows for the computation of NDI (Messer) for each year between 2010-2020 (when the U.S. census characteristics are available to date). There was no code companion to compute NDI (Powell-Wiley) included in Andrews et al. (2020) or Slotman et al. (2022), but the package author worked directly with the latter manuscript authors to replicate their SAS code in R for the powell_wiley() function. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in Andrews et al. (2020) and Slotman et al. (2022) because the two studies used a different statistical platform (i.e., SPSS and SAS, respectively) that intrinsically calculate the principal component analysis differently from R. The internal function to calculate the Atkinson Index is based on the Atkinson() function in the DescTools package.

When citing this package for publication, please follow:

citation('ndi')

Questions? Feedback?

For questions about the package, please contact the maintainer Dr. Ian D. Buller or submit a new issue. Confirmation of the computation, feedback, and feature collaboration is welcomed, especially from the authors of the references cited above.

ndi's People

Contributors

Stargazers

Watchers

Forkers

davisvaughan

ndi's Issues

NDIQuint Sorting Question with Powell-Wiley

hey @idblr! I've been doing more testing and found something curious that I'm hoping you can straighten out for me.

df <- ndi::powell_wiley(geo = "county", state = "MO", year = 2020, round_output = FALSE)[[1]]
df <- dplyr::arrange(df, NDI)

On rows 69 and 70 of the resulting df object, Barry County (29009) gets a score of 0.24683105 while Sullivan County (29211) gets a score of 0.26129508. However, in NDIQunit, Barry County is given 4-AboveAvg deprivation while Sullivan is given 3-Average deprivation.

This is possible because the log of the total population is factored into the ranking process, correct?

ndi version 0.1.3 reverse dependency check failure when CENSUS_API_KEY not ""

00check.log
testthat.Rout.zip

If Sys.getenv("CENSUS_API_KEY") != ""`, the tests are not skipped, and are not silent:

> nzchar(Sys.getenv("CENSUS_API_KEY"))
[1] TRUE
> anthopolos(state = "DC", year = 2020, subgroup = c("NHoLB", "HoLB"))
  |======================================================================| 100%
$ri
# A tibble: 206 × 8
   GEOID       state                county      tract     RI Total…¹ NHoLB  HoLB
   <chr>       <chr>                <chr>       <chr>  <dbl>   <dbl> <dbl> <dbl>
 1 11001000101 District of Columbia District o… 1.01  0.0390    1250     0     0
 2 11001000102 District of Columbia District o… 1.02  0.0413    3318    34     0
 3 11001000201 District of Columbia District o… 2.01  0.0457    3972   239     8
 4 11001000202 District of Columbia District o… 2.02  0.0371    4665   131    11
 5 11001000300 District of Columbia District o… 3     0.0536    6504   178     0
 6 11001000400 District of Columbia District o… 4     0.0495    1481    32     0
 7 11001000501 District of Columbia District o… 5.01  0.101     3343   233     0
 8 11001000502 District of Columbia District o… 5.02  0.0616    3580   150    20
 9 11001000600 District of Columbia District o… 6     0.0749    4942   411     0
10 11001000702 District of Columbia District o… 7.02  0.0763    2971   335     0
# … with 196 more rows, and abbreviated variable name ¹TotalPop
# ℹ Use `print(n = ...)` to see more rows

$missing
# A tibble: 3 × 4
  variable total n_missing percent_missing
  <chr>    <int>     <int> <chr>          
1 HoLB       206         0 0 %            
2 NHoLB      206         0 0 %            
3 TotalPop   206         0 0 %            

> bravo(state = "DC", year = 2009, subgroup = c("LtHS", "HSGiE"))
  |======================================================================| 100%
$ei
# A tibble: 188 × 24
   GEOID   state county tract     EI Total…¹  mNSC mNt4G m5t6G m7t8G   m9G  m10G
   <chr>   <chr> <chr>  <chr>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 110010… Dist… Distr… 1     0.0524    3882     0     0     0     0     0     0
 2 110010… Dist… Distr… 2.01  0.0522     127     0     0     0     0     0     0
 3 110010… Dist… Distr… 2.02  0.0442    2371     0     0     0     0     0     0
 4 110010… Dist… Distr… 3     0.0732    3563     0     0     0     0     0     0
 5 110010… Dist… Distr… 4     0.0832    1099     0     0     0     0     0     0
 6 110010… Dist… Distr… 5.01  0.0809    2426     0     0     0     0     0     0
 7 110010… Dist… Distr… 5.02  0.0942    2471     0     0     7     0     0     0
 8 110010… Dist… Distr… 6     0.104     4436    10     0     8    37     0   146
 9 110010… Dist… Distr… 7.01  0.114     3782     0     0     0     0    26     0
10 110010… Dist… Distr… 7.02  0.0805    2237     0     0     0    33     0     0
# … with 178 more rows, 12 more variables: m11G <dbl>, m12GND <dbl>,
#   mHSGGEDoA <dbl>, fNSC <dbl>, fNt4G <dbl>, f5t6G <dbl>, f7t8G <dbl>,
#   f9G <dbl>, f10G <dbl>, f11G <dbl>, f12GND <dbl>, fHSGGEDoA <dbl>, and
#   abbreviated variable name ¹TotalPop
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

$missing
# A tibble: 19 × 4
   variable  total n_missing percent_missing
   <chr>     <int>     <int> <chr>          
 1 f10G        188         0 0 %            
 2 f11G        188         0 0 %            
 3 f12GND      188         0 0 %            
 4 f5t6G       188         0 0 %            
 5 f7t8G       188         0 0 %            
 6 f9G         188         0 0 %            
 7 fHSGGEDoA   188         0 0 %            
 8 fNSC        188         0 0 %            
 9 fNt4G       188         0 0 %            
10 m10G        188         0 0 %            
11 m11G        188         0 0 %            
12 m12GND      188         0 0 %            
13 m5t6G       188         0 0 %            
14 m7t8G       188         0 0 %            
15 m9G         188         0 0 %            
16 mHSGGEDoA   188         0 0 %            
17 mNSC        188         0 0 %            
18 mNt4G       188         0 0 %            
19 TotalPop    188         0 0 %

Krieger function not found

Hy,
While checking the codes, I am seeing the krieger function is not working. How can I resolve this issue?

Error in krieger(state = "TX", year = 2020) :
could not find function "krieger"

Thanks
Rasel

Significant Digits

One additional question that popped up in messer() is the rounding that takes place at the end. I'm sure you had a reason for doing it! Just curious your thoughts about whether that should be optional?

GitHub Actions and NDI on ubuntu-latest (oldrel-2)

Hey @idblr! Thanks for keeping this package kicking - it's such a great contribution! We're about to release a package that has wrappers around a couple of the NDI indices, and noticed that we can't pass R CMD check on GitHub Actions for ubuntu-latest (oldrel-2) and older. Also noticed that your GitHub Actions workflow stops with ubuntu-latest (oldrel-1), and that your DESCRIPTION file suggests R 3.5 should be supported. Any insight on what is going on with older releases?

── R CMD check results ────────────────────────────── deprivateR 0.1.0.9000 ────
Duration: 1.2s

❯ checking package dependencies ... ERROR
Error:   Package required but not available: ‘ndi’
  
  See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’
  manual.

1 error ✖ | 0 warnings ✔ | 0 notes ✔
Error: Error: R CMD check found ERRORs
Execution halted
Error: Process completed with exit code 1.

Refactoring to Allow for Existing Data

Great work on ndi, @idblr!

I'd love to integrate your package into some existing workflows, but would like to be able to pass a data frame of correctly prepped/formatted data to messer() or one of the other functions.

The sociome package has a great workaround that allows this, the calculate_adi() function. Instead of calling the main get_adi() function that downloads the data and calculates ADI, users with pre-existing data can skip the download step by calling calculate_adi() directly.

I'm wondering if you'd be open to a PR that would create (as an example) a calculate_messer() function that messer() would call as a subfunction after data download and prep. I would also export it so that other workflows could call it directly. I'm most interested in this for messer() and powell_wiley(), but would happily write it for anthopolos(), bravo(), and krieger() as well (doesn't make sense to do it for gini()). Thanks for considering!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.