Giter Site home page Giter Site logo

vkhodygo / mice Goto Github PK

View Code? Open in Web Editor NEW

This project forked from amices/mice

0.0 0.0 0.0 70.04 MB

Multivariate Imputation by Chained Equations

Home Page: https://amices.org/mice/

License: GNU General Public License v2.0

C++ 1.46% R 97.99% TeX 0.55%

mice's Introduction

mice

CRAN_Status_Badge R-CMD-check

The mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.

Installation

The mice package can be installed from CRAN as follows:

install.packages("mice")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "amices/mice")

Minimal example

library(mice, warn.conflicts = FALSE)

# show the missing data pattern
md.pattern(nhanes)

Missing data pattern of nhanes data. Blue is observed, red is missing.

Missing data pattern of nhanes data. Blue is observed, red is missing.
#>    age hyp bmi chl   
#> 13   1   1   1   1  0
#> 3    1   1   1   0  1
#> 1    1   1   0   1  1
#> 1    1   0   0   1  2
#> 7    1   0   0   0  3
#>      0   8   9  10 27

The table and the graph summarize where the missing data occur in the nhanes dataset.

# multiple impute the missing values
imp <- mice(nhanes, maxit = 2, m = 2, seed = 1)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl
#>   1   2  bmi  hyp  chl
#>   2   1  bmi  hyp  chl
#>   2   2  bmi  hyp  chl

# inspect quality of imputations
stripplot(imp, chl, pch = 19, xlab = "Imputation number")

Distribution of chl per imputed data set.

Distribution of chl per imputed data set.

In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing.

# fit complete-data model
fit <- with(imp, lm(chl ~ age + bmi))

# pool and summarize the results
summary(pool(fit))
#>          term estimate std.error statistic    df p.value
#> 1 (Intercept)     9.08     73.09     0.124  4.50  0.9065
#> 2         age    35.23     17.46     2.017  1.36  0.2377
#> 3         bmi     4.69      1.94     2.417 15.25  0.0286

The complete-data is fit to each imputed dataset, and the results are combined to arrive at estimates that properly account for the missing data.

mice 3.0

Version 3.0 represents a major update that implements the following features:

  1. blocks: The main algorithm iterates over blocks. A block is simply a collection of variables. In the common MICE algorithm each block was equivalent to one variable, which - of course - is the default; The blocks argument allows mixing univariate imputation method multivariate imputation methods. The blocks feature bridges two seemingly disparate approaches, joint modeling and fully conditional specification, into one framework;

  2. where: The where argument is a logical matrix of the same size of data that specifies which cells should be imputed. This opens up some new analytic possibilities;

  3. Multivariate tests: There are new functions D1(), D2(), D3() and anova() that perform multivariate parameter tests on the repeated analysis from on multiply-imputed data;

  4. formulas: The old form argument has been redesign and is now renamed to formulas. This provides an alternative way to specify imputation models that exploits the full power of R’s native formula’s.

  5. Better integration with the tidyverse framework, especially for packages dplyr, tibble and broom;

  6. Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.

  7. Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.

See MICE: Multivariate Imputation by Chained Equations for more resources.

I’ll be happy to take feedback and discuss suggestions. Please submit these through Github’s issues facility.

Resources

Books

  1. Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL.

Course materials

  1. Handling Missing Data in R with mice
  2. Statistical Methods for combined data sets

Vignettes

  1. Ad hoc methods and the MICE algorithm
  2. Convergence and pooling
  3. Inspecting how the observed data and missingness are related
  4. Passive imputation and post-processing
  5. Imputing multilevel data
  6. Sensitivity analysis with mice
  7. Generate missing values with ampute
  8. futuremice: Wrapper for parallel MICE imputation through futures

Code from publications

  1. Flexible Imputation of Missing Data. Second edition.

Acknowledgement

The cute mice sticker was designed by Jaden M. Walters. Thanks Jaden!

Code of Conduct

Please note that the mice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

mice's People

Contributors

stefvanbuuren avatar gerkovink avatar prockenschaub avatar thomvolker avatar rianneschouten avatar hanneoberman avatar edoardocostantini avatar vincentarelbundock avatar bfgray3 avatar andland avatar edbonneville avatar bgall avatar vkhodygo avatar mingyang-cai avatar lukaswallrich avatar stephematician avatar shahabjolani avatar kkleinke avatar wibeasley avatar rasel-biswas avatar michaelchirico avatar mmaechler avatar dnzmarcio avatar jeroen avatar hadley avatar clbustos avatar bcjaeger avatar baogorek avatar bquast avatar andrewlawrence avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.