Giter Site home page Giter Site logo

phydev / mice Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 140 KB

Multiple imputation with chained equation implemented from scratch. This is a low performance implementation meant for pedagogical purposes only.

License: GNU General Public License v3.0

Python 100.00%
missingness data-cleaning data-science mice-algorithm multiple-imputation imputation

mice's Introduction

MICE - Multiple Imputation by Chained Equations

Multiple imputation by chained equation implemented from scratch.

Example 1: iris dataset

Load the iris data from sklearn and introduce missing values with pyampute package

from sklearn.datasets import load_iris
from pyampute.ampute import MultivariateAmputation

iris = load_iris(as_frame=True, return_X_y=False)["data"]
ma = MultivariateAmputation()
X_amp = ma.fit_transform(iris.to_numpy()) # pyampute requires the input as numpy array

Now we can apply MICE in the amputed dataset

from src import mice
imp = mice.mice(X, n_iterations = 20, m_imputations = 10, seed=42)

Example 2: distribution plot for the sample data

After imputation you should make diagnostic plots and check the distribution of the multiply imputed datasets comparing with the complete case data. Bellow you can find the plot for the example we provide in /tests directory:

import seaborn as sns
import matplotlib.pyplot as plt

p = 3 # column to be plotted
custom_lines = [plt.Line2D([0], [0], color="red", lw=4),
                plt.Line2D([0], [0], color="grey", lw=4),
                plt.Line2D([0], [0], color="blue", lw=4)]

fig, ax = plt.subplots()

for m in range(len(imp)):
    sns.kdeplot(imp[m][:, p], label="Imputed", color="black", lw=0.2, ax=ax)
sns.kdeplot(X_amp[:,p], label="Missing", color="blue", ax=ax)
sns.kdeplot(df.to_numpy()[:, p], label="Complete", color="red",ax=ax)
plt.xlabel("Age (years)")
ax.legend(custom_lines, ['Complete', 'Imputed', 'Missing'], loc="upper left")
plt.savefig("qol_distribution_mice.png")

Figure showing the distribution lines for 10 imputed datasets, the original dataset and the amputed dataset with missing values.

Beware

This is a low performance implementation meant for pedagogical purposes only. There are several limitations and improvements that can be made, for research please use one of the available packages for multiple imputation:

mice's People

Contributors

phydev avatar

Watchers

 avatar

mice's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.