Giter Site home page Giter Site logo

zipy's Introduction

Implementing a Zero‑Inflated Poisson Regression Package in Python

Background

Some count data, at times, may prove difficult to run standard statistical analyses on, because of a prevalence zeros that may skew the dataset. Zero‐inflated Poisson regression solves this problem by allowing for the presence of greater variability within a dataset than would usually be expected based on a given statistical model. This model assumes that a sample is a mixture of two individual sorts — one of whose counts are generated through standard Poisson regression. The other group can be termed as absolute zero, where there exists zero probability of a count greater than 0. Observed values of 0 could come from either group. This model is most suitable for applications when a conventional negative binomial model might not be a good fit. Its application ranges from software fault prediction to the evaluation of prognostic factors of disease.

This model is easily implemented in R via the pscl package. However, there is currently not a Python implementation of a similar package. Our project seeks to translate the zero‐inflated model functionality of the pscl package into Python.

Objectives

For this project, we aim to accomplish the following goals. Goals I and II will constitute our minimum viable product, while we aim to complete Goal III if time permits.

Goal I. Translate the R pscl package function zeroinf, namely the sub‐functions listed below. Said sub‐functions will use the results of GLM.fit for their starting values. We will limit ourselves to implementing the logit link for simplicity.

  • likelihood and log‐likelihood function
  • gradient function
  • maximum likelihood estimation function

Goal II. Compare the functionality of our package for both veracity and speed against the R pscl package.

  • The veracity of the results will be tested on the datasets supplied with the pscl package, including but not limited to the absentee ballot dataset.
  • The speed will be clocked using the linux command‐line tool time tool to time both the R and Python packages real runtimes.

Goal III. Implement at least one of the following:

  • zero‐inflated negative binomial regression
  • zero‐inflated geometric regression

Sources

  • Jackman, S., Tahk, A., Zeileis, A., Maimone, C., Fearon, J., Meers, Z., ... & Imports, M. A. S. S. (2017). Package ‘pscl’. See http://github.com/atahk/pscl.
  • Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of statistical software, 27(8), 1‐25.

zipy's People

Contributors

alexisjcarr avatar malik180 avatar luna22 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.