andrewtavis / causeinfer Goto Github PK

Machine learning based causal inference/uplift in Python

License: BSD 3-Clause "New" or "Revised" License

Python 78.44% Stata 21.56%

causal-inference machine-learning treatment-effects uplift-modeling causality data-science statistics econometrics python uplift

causeinfer's Introduction

Organizations initiated

activist is an open-source, non-profit political action network. The current goal is the creation of activist.org, a platform to find and discover political events and organizations.
- activist (web) • activist-Android • activist-iOS
Scribe creates keyboard apps for language learners that include translation, verb conjugation and word annotation for confident communication without leaving the keyboard.
- Scribe-iOS • Scribe-Data • Scribe-Android • Scribe-Desktop

causeinfer's People

Contributors

Stargazers

Watchers

Forkers

arita37 panayo fatima886 lenamax2355 wangshengneu aahmadai coolboydan onurgitmez python-repository-hub ricciardi shalevy1

causeinfer's Issues

Add Criteo uplift dataset

This is an extensive issue that involves the creation of the following files:

causeinfer.data.criteo.py where the data will be loaded and formatted
examples/business_criteo.ipynb where all valid models will be ran over the Criteo dataset
Adding the dataset to examples/an_iterated_model_dataset_comparison.ipynb

The dataset can be found here.

causeinfer.data.criteo.py would then need to be added to the documentation and be tested, but these should be easy steps.

Add reflective/pessimistic uplift

Further baseline causal inference models that could be added are reflective and pessimistic uplift from Shaar, et al (2016). These should be able to be implemented using the current base modeling classes from base_models.py.

The paper in question is:

Shaar, A., Abdessalem, T. and Segard, O (2016). “Pessimistic Uplift Modeling”. ACM SIGKDD, August 2016, San Francisco, California, USA.

These were shown to have poor performance in the review Devriendt, F. et al. (2018), but ease of implementation and the no free lunch principle of causal inference makes them something to add.

Files to create:

causeinfer.standard_algorithms.reflective.py
causeinfer.standard_algorithms.pessimistic.py

These models would then be applied in the various examples, tests would need to be written for them, and they would need to be added to the documentation.

Add Pintilie Tamoxifen dataset

This is an extensive issue that involves the creation of the following files:

causeinfer.data.tamoxifen.py where the data will be loaded and formatted
examples/medical_tamoxifen.ipynb where all valid models will be ran over the Pintilie Tamoxifen dataset
Adding the dataset to examples/an_iterated_model_dataset_comparison.ipynb

The dataset can be found in the datasets directory.

causeinfer.data.tamoxifen.py would then need to be added to the documentation and be tested, but these should be easy steps.

Add Lalonde Job Training dataset

This is an extensive issue that involves the creation of the following files:

causeinfer.data.lalonde.py where the data will be loaded and formatted
examples/socioeconomic_lalonde.ipynb where all valid models will be ran over the Lalonde dataset
Adding the dataset to examples/an_iterated_model_dataset_comparison.ipynb

The dataset can be found here.

causeinfer.data.lalonde.py would then need to be added to the documentation and be tested, but these should be easy steps.

Create concise requirement and env files

This issue is for creating concise versions of requirements.txt and environment.yml for causeinfer. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

As of now both files are being created with the following commands in the package's conda virtual environment:

pip list --format=freeze > requirements.txt  
conda env export --no-builds | grep -v "^prefix: " > environment.yml

causeinfer and other obviously unneeded packages are then removed from these files before being uploaded.

Any insights or help would be much appreciated!

Your wheel is a mess

The v0.1.1.2 wheel of this project released on PyPI yesterday contains numerous copies of temporary files that shouldn't be a in a wheel; you can see the wheel's file listing here or by running zipinfo or a similar program on the .whl file. I believe that this happened to due to the use of find_namespace_packages(); because your setup.py calls this function without an exclude or include argument, it picks up the directories containing .py files in the temporary build/ directory, and so each build of the wheel ends up including build/ while also adding to build/, leading to the problem at hand.

My recommendation for fixing this is:

Delete the build/ directory from your local repository, perhaps by running git clean -dXf.
As your Python packages all have __init__.py files, there is no reason to use find_namespace_packages() over find_packages(), so I would recommend changing what function you use in setup.py. If you are set on using find_namespace_packages(), you should add include=["causeinfer", "causeinfer.*] to the function's arguments in order to not capture anything outside your code directory. (You should also set include or exclude even if you do use find_packages() in order to exclude your tests/ directory from the wheel; see here for more information.)

Add data simulator

Adding a data simulator would be a positive addition to causeinfer in that it would allow users to more accurately check general accuracies and compare models.

Files to create:

causeinfer.data.simulation.py
examples/simulation.ipynb
Adding the dataset to examples/an_iterated_model_dataset_comparison.ipynb

The simulator would then need to be documented, which should be simple with autodoc :)

New causal inference datasets

Use this issue to post links to causal inference datasets that could be added to causeinfer.

Datasets that are found can be converted into issues where scripts for formatting and loading the data would be written. These would be good first issues for people who want to contribute. Issues could also be directly made when datasets are found (see this issue for suggestions).

With the dataset link it would be helpful to get a short description that includes whether it's related to business, medical, or socioeconomic fields.

Thanks for your time and input!

andrewtavis / causeinfer Goto Github PK

causeinfer's Introduction

Organizations initiated

causeinfer's People

Contributors

Stargazers

Watchers

Forkers

causeinfer's Issues

Add Criteo uplift dataset

Add reflective/pessimistic uplift

Add Pintilie Tamoxifen dataset

Add Lalonde Job Training dataset

Create concise requirement and env files

Your wheel is a mess

Add data simulator

New causal inference datasets

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent