Giter Site home page Giter Site logo

andrewtavis / causeinfer Goto Github PK

View Code? Open in Web Editor NEW
55.0 5.0 11.0 14.68 MB

Machine learning based causal inference/uplift in Python

License: BSD 3-Clause "New" or "Revised" License

Python 78.44% Stata 21.56%
causal-inference machine-learning treatment-effects uplift-modeling causality data-science statistics econometrics python uplift

causeinfer's Introduction

causeinfer's People

Contributors

andrewtavis avatar dependabot[bot] avatar imgbotapp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

causeinfer's Issues

Add Criteo uplift dataset

This is an extensive issue that involves the creation of the following files:

The dataset can be found here.

causeinfer.data.criteo.py would then need to be added to the documentation and be tested, but these should be easy steps.

Add reflective/pessimistic uplift

Further baseline causal inference models that could be added are reflective and pessimistic uplift from Shaar, et al (2016). These should be able to be implemented using the current base modeling classes from base_models.py.

The paper in question is:

Shaar, A., Abdessalem, T. and Segard, O (2016). “Pessimistic Uplift Modeling”. ACM SIGKDD, August 2016, San Francisco, California, USA.

These were shown to have poor performance in the review Devriendt, F. et al. (2018), but ease of implementation and the no free lunch principle of causal inference makes them something to add.

Files to create:

  • causeinfer.standard_algorithms.reflective.py
  • causeinfer.standard_algorithms.pessimistic.py

These models would then be applied in the various examples, tests would need to be written for them, and they would need to be added to the documentation.

Add Pintilie Tamoxifen dataset

This is an extensive issue that involves the creation of the following files:

The dataset can be found in the datasets directory.

causeinfer.data.tamoxifen.py would then need to be added to the documentation and be tested, but these should be easy steps.

Add Lalonde Job Training dataset

This is an extensive issue that involves the creation of the following files:

The dataset can be found here.

causeinfer.data.lalonde.py would then need to be added to the documentation and be tested, but these should be easy steps.

Create concise requirement and env files

This issue is for creating concise versions of requirements.txt and environment.yml for causeinfer. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

As of now both files are being created with the following commands in the package's conda virtual environment:

pip list --format=freeze > requirements.txt  
conda env export --no-builds | grep -v "^prefix: " > environment.yml

causeinfer and other obviously unneeded packages are then removed from these files before being uploaded.

Any insights or help would be much appreciated!

Your wheel is a mess

The v0.1.1.2 wheel of this project released on PyPI yesterday contains numerous copies of temporary files that shouldn't be a in a wheel; you can see the wheel's file listing here or by running zipinfo or a similar program on the .whl file. I believe that this happened to due to the use of find_namespace_packages(); because your setup.py calls this function without an exclude or include argument, it picks up the directories containing .py files in the temporary build/ directory, and so each build of the wheel ends up including build/ while also adding to build/, leading to the problem at hand.

My recommendation for fixing this is:

  1. Delete the build/ directory from your local repository, perhaps by running git clean -dXf.
  2. As your Python packages all have __init__.py files, there is no reason to use find_namespace_packages() over find_packages(), so I would recommend changing what function you use in setup.py. If you are set on using find_namespace_packages(), you should add include=["causeinfer", "causeinfer.*] to the function's arguments in order to not capture anything outside your code directory. (You should also set include or exclude even if you do use find_packages() in order to exclude your tests/ directory from the wheel; see here for more information.)

Add data simulator

Adding a data simulator would be a positive addition to causeinfer in that it would allow users to more accurately check general accuracies and compare models.

Files to create:

The simulator would then need to be documented, which should be simple with autodoc :)

New causal inference datasets

Use this issue to post links to causal inference datasets that could be added to causeinfer.

Datasets that are found can be converted into issues where scripts for formatting and loading the data would be written. These would be good first issues for people who want to contribute. Issues could also be directly made when datasets are found (see this issue for suggestions).

With the dataset link it would be helpful to get a short description that includes whether it's related to business, medical, or socioeconomic fields.

Thanks for your time and input!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.