Giter Site home page Giter Site logo

raamana / missingdata Goto Github PK

View Code? Open in Web Editor NEW
17.0 4.0 1.0 1.56 MB

missing data handing: visualize and impute

License: Other

Makefile 1.92% Python 98.08%
missing-data data-science visualization biostatistics machine-learning missing-values dirty-data neuroscience epidemiology imputation

missingdata's Introduction

missingdata

image

citation

image

Documentation Status

missing data visualization and imputation

Goals

To provide an easy to use yet thorough assessment of missing values in one's dataset:

  • in addition to the blackholes plot bellow,
  • show the variable-to-variable, subject-to-subject co-missingness, and
  • quantify the TYPE of missingness etc

Note

To easily manage your data with missing values etc, I strongly recommend you to move away from CSV files and start managing your data in self-contained flexible data structures like pyradigm, as your data, as well your needs, will only get bigger & more complicated e.g. with mixed-types, missing values and large number of groups.

These would be great contributions if you have time.

Features

  • visualization
  • imputation (coming!)
  • other handling

blackholes plot

image

State

  • Software is beta and under dev. Update regularly and quite often!!
  • Contributions most welcome, esp. reporting bugs and improving usability.

Installation

pip install -U missingdata

We encourage you to update quite often, when you run into any issues.

Usage

Take a look at the help text first before diving in to use it - with the following code:

from missingdata import blackholes
help(blackholes)

I encourage you to read the text for each parameter carefully to understand the behaviour of this plotting mechanism.

Note

If you don't see any labels (for rows or columns), when you try the blackholes plot for the first time, it may be because the total effective number of rows/cols being displayed, after applying filter_spec_*, exceeded a preset number (60/80) and we removed the labels to avoid them getting occluded or becoming illegible. You can use the parameter freq_thresh_show_labels to bring the effective number of rows/cols down to display to a smaller number, or pass show_all_labels=True to force the display of labels. If number of subjects or variables is large, you may want to increase figsize (width or height), to minimize occlusion and improve label readability.

Also, the defaults chosen may not work for you, hence I strongly encourage you to control as many parameters as needed to customize the plot to your liking. If a feature you need is not served currently, send a PR with improvements, or open an issue. Thanks.

Let's say you have all the data in a pandas DataFrame, where subject IDs are in a 'sub_ids' column and variable names are in a 'var_names' column, and they belong to groups identified by sub_class and var_group, you can use the following code produce the blackholes plot:

blackholes(data_frame,
           label_rows_with='sub_ids', label_cols_with='var_names',
           group_rows_by=sub_class, group_cols_by=var_group)

If you were interested in seeing subjects/variables with least amount of missing data, you can control miss perc window with filter_spec_samples and/or filter_spec_variables by passing a tuple of two floats e.g. (0, 0.1) which will filter away those with more than 10% of missing data.

blackholes(data_frame,
           label_rows_with='sub_ids', label_cols_with='var_names',
           filter_spec_samples=(0, 0.1))

The other parameters for the function are self-explanatory.

Please open an issue if you find something confusing, or have feedback to improve, or identify a bug. Thanks.

Citation

If you find this package useful, I'd greatly appreciate if cite this package via:

Pradeep Reddy Raamana, (2019), "missingdata python library for visualization and handling of missing values" (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.3352336 DOI: 10.5281/zenodo.3352336

missingdata's People

Contributors

raamana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

jhlegarreta

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.