trusted-ai / aif360 Goto Github PK

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

Home Page: https://aif360.res.ibm.com/

License: Apache License 2.0

Python 95.31% R 3.66% Dockerfile 0.02% Java 1.01%

ai fairness-ai fairness fairness-testing fairness-awareness-model bias-detection bias bias-correction bias-reduction bias-finder

aif360's Introduction

AI Fairness 360 (AIF360)

The AI Fairness 360 toolkit is an extensible open-source library containing techniques developed by the research community to help detect and mitigate bias in machine learning models throughout the AI application lifecycle. AI Fairness 360 package is available in both Python and R.

The AI Fairness 360 package includes

a comprehensive set of metrics for datasets and models to test for biases,
explanations for these metrics, and
algorithms to mitigate bias in datasets and models. It is designed to translate algorithmic research from the lab into the actual practice of domains as wide-ranging as finance, human capital management, healthcare, and education. We invite you to use it and improve it.

The AI Fairness 360 interactive experience provides a gentle introduction to the concepts and capabilities. The tutorials and other notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

Being a comprehensive set of capabilities, it may be confusing to figure out which metrics and algorithms are most appropriate for a given use case. To help, we have created some guidance material that can be consulted.

We have developed the package with extensibility in mind. This library is still in development. We encourage the contribution of your metrics, explainers, and debiasing algorithms.

Get in touch with us on Slack (invitation here)!

Supported bias mitigation algorithms

Optimized Preprocessing (Calmon et al., 2017)
Disparate Impact Remover (Feldman et al., 2015)
Equalized Odds Postprocessing (Hardt et al., 2016)
Reweighing (Kamiran and Calders, 2012)
Reject Option Classification (Kamiran et al., 2012)
Prejudice Remover Regularizer (Kamishima et al., 2012)
Calibrated Equalized Odds Postprocessing (Pleiss et al., 2017)
Learning Fair Representations (Zemel et al., 2013)
Adversarial Debiasing (Zhang et al., 2018)
Meta-Algorithm for Fair Classification (Celis et al., 2018)
Rich Subgroup Fairness (Kearns, Neel, Roth, Wu, 2018)
Exponentiated Gradient Reduction (Agarwal et al., 2018)
Grid Search Reduction (Agarwal et al., 2018, Agarwal et al., 2019)
Fair Data Adaptation (Plečko and Meinshausen, 2020, Plečko et al., 2021)
Sensitive Set Invariance/Sensitive Subspace Robustness (Yurochkin and Sun, 2020, Yurochkin et al., 2019)

Supported fairness metrics

Comprehensive set of group fairness metrics derived from selection rates and error rates including rich subgroup fairness
Comprehensive set of sample distortion metrics
Generalized Entropy Index (Speicher et al., 2018)
Differential Fairness and Bias Amplification (Foulds et al., 2018)
Bias Scan with Multi-Dimensional Subset Scan (Zhang, Neill, 2017)

Setup

R

install.packages("aif360")

For more details regarding the R setup, please refer to instructions here.

Python

Supported Python Configurations:

OS	Python version
macOS	3.8 – 3.11
Ubuntu	3.8 – 3.11
Windows	3.8 – 3.11

(Optional) Create a virtual environment

AIF360 requires specific versions of many Python packages which may conflict with other projects on your system. A virtual environment manager is strongly recommended to ensure dependencies may be installed safely. If you have trouble installing AIF360, try this first.

Conda

Conda is recommended for all configurations though Virtualenv is generally interchangeable for our purposes. Miniconda is sufficient (see the difference between Anaconda and Miniconda if you are curious) if you do not already have conda installed.

Then, to create a new Python 3.11 environment, run:

conda create --name aif360 python=3.11
conda activate aif360

The shell should now look like (aif360) $. To deactivate the environment, run:

(aif360)$ conda deactivate

The prompt will return to $ .

Install with `pip`

To install the latest stable version from PyPI, run:

pip install aif360

Note: Some algorithms require additional dependencies (although the metrics will all work out-of-the-box). To install with certain algorithm dependencies included, run, e.g.:

pip install 'aif360[LFR,OptimPreproc]'

or, for complete functionality, run:

pip install 'aif360[all]'

The options for available extras are: OptimPreproc, LFR, AdversarialDebiasing, DisparateImpactRemover, LIME, ART, Reductions, FairAdapt, inFairness, LawSchoolGPA, notebooks, tests, docs, all

If you encounter any errors, try the Troubleshooting steps.

Manual installation

Clone the latest version of this repository:

git clone https://github.com/Trusted-AI/AIF360

If you'd like to run the examples, download the datasets now and place them in their respective folders as described in aif360/data/README.md.

Then, navigate to the root directory of the project and run:

pip install --editable '.[all]'

Run the Examples

To run the example notebooks, complete the manual installation steps above. Then, if you did not use the [all] option, install the additional requirements as follows:

pip install -e '.[notebooks]'

Finally, if you did not already, download the datasets as described in aif360/data/README.md.

Troubleshooting

If you encounter any errors during the installation process, look for your issue here and try the solutions.

TensorFlow

See the Install TensorFlow with pip page for detailed instructions.

Note: we require 'tensorflow >= 1.13.1'.

Once tensorflow is installed, try re-running:

pip install 'aif360[AdversarialDebiasing]'

TensorFlow is only required for use with the aif360.algorithms.inprocessing.AdversarialDebiasing class.

CVXPY

On MacOS, you may first have to install the Xcode Command Line Tools if you never have previously:

xcode-select --install

On Windows, you may need to download the Microsoft C++ Build Tools for Visual Studio 2019. See the CVXPY Install page for up-to-date instructions.

Then, try reinstalling via:

pip install 'aif360[OptimPreproc]'

CVXPY is only required for use with the aif360.algorithms.preprocessing.OptimPreproc class.

Using AIF360

The examples directory contains a diverse collection of jupyter notebooks that use AI Fairness 360 in various ways. Both tutorials and demos illustrate working code using AIF360. Tutorials provide additional discussion that walks the user through the various steps of the notebook. See the details about tutorials and demos here

Citing AIF360

A technical description of AI Fairness 360 is available in this paper. Below is the bibtex entry for this paper.

@misc{aif360-oct-2018,
    title = "{AI Fairness} 360:  An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias",
    author = {Rachel K. E. Bellamy and Kuntal Dey and Michael Hind and
	Samuel C. Hoffman and Stephanie Houde and Kalapriya Kannan and
	Pranay Lohia and Jacquelyn Martino and Sameep Mehta and
	Aleksandra Mojsilovic and Seema Nagar and Karthikeyan Natesan Ramamurthy and
	John Richards and Diptikalyan Saha and Prasanna Sattigeri and
	Moninder Singh and Kush R. Varshney and Yunfeng Zhang},
    month = oct,
    year = {2018},
    url = {https://arxiv.org/abs/1810.01943}
}

AIF360 Videos

Introductory video to AI Fairness 360 by Kush Varshney, September 20, 2018 (32 mins)

Contributing

The development fork for Rich Subgroup Fairness (inprocessing/gerryfair_classifier.py) is here. Contributions are welcome and a list of potential contributions from the authors can be found here.

aif360's People

Contributors

Stargazers

Watchers

Forkers

datapriestess rraymondhp cclauss michielbakker sameepm99 vijaykeswani claudiopinheiro codeaudit lerongil animeshsingh algoskynet sevinjyolchuyeva ferazkian lexfefegha ckadner zhangtaozhir setarp scottdangelo n1kh1l-ja1n ericadu selenak99 stonesteel1023 shubhampachori12110095 cosecant-csc anilujohn nrkarthikeyan rkeb statdataanalyzer dirkkalmbach ajaypod aliciasun b0bbybaldi michaelhind jiyulongxu twistedtree premnathkn vishalmurali agathe-balayn adrlf yiiizhang duan199603 saturnre lkafle gusrabbit kant vgupta123 afaemor danizu positivecontrol quixoticblink christophersweeney direkshan-digital pvr1 mariaborbones amitsaini007 lazycrazyowl yguo007 jaywrkr kalyserge ewouth cchienwu jwalsh joymallyac charygao another-green rddill-ibm tarng anasus1 garethjns geekycat thesharmanitish bkalapar fhbzc mmcelaney dibyendumishra dmuelasr rajeshg2017 sgaied chaibapchya smilechacha wonsupshin connorhofenbi library-fy marusicines ktp-forked-repos deeppandya222 vbhat0891 kirnap chris291 yaoyang33 shanky3011 mmn17 jbarman joshuathliu vg0x0 kfzyqin arunkumarramanan james-jr-sc shyutz bharathrangarajan

aif360's Issues

Port micellaneous items to sklearn-compatible API

MEPS dataset
differential fairness metrics
rich subgroup fairness

Method for Reject Option Classification fit_predict is missing arguments

The fit_predict function for Reject Option Classification is defined as: def fit_predict(self, dataset): """fit and predict methods sequentially.""" return self.fit().predict(dataset)
only expecting one dataset but the fit function requires two (dataset_true, dataset_pred)

Another typo in the credit scoring notebook

The Step 4, the notebooks says ...was getting 19.4% more positive outcomes... but should instead say 12.1%. The screenshot below shows the section in more detail.

features_to_drop in Class MEPSDataset19 not working

I tried to run the MEPSDataset19 example with features_to_drop. It ran with some features but not with some eg. 'PHQ242'

Error it gave: KeyError: "['PHQ242'] not in index"

Not sure if I did something wrong but found a workaround by not keeping it in features_to_keep so that the question to drop does not arise.

make estimators and scorers sklearn compatible

The inprocessing algorithms are basically like an Estimator. Ideally, it should be possible to replace a classifier in a scikit-learn pipeline, with one from aif360. An example pipeline (taken from this example) looks like:

model = make_pipeline(StandardScaler(),
                      LogisticRegression(solver='liblinear'))
X_train = meps_orig_train.features
y_train = meps_orig_train.labels.ravel()

model_lr = model.fit(X_train, y_train,
                     **{"logisticregression__sample_weight":meps_orig_train.instance_weights})

But when we move to use a model such as PrejudiceRemover, we have to break the pipeline and have the model separate, since it doesn't follow the API requirements to be fit for a pipeline.

model = PrejudiceRemover(sensitive_attr=sens_attr, eta = 25.0)
scale = StandardScaler().fit(tr_dataset.features)

tr_dataset.features = scale.transform(tr_dataset.features)
model.fit(tr_dataset)

Once it can be fit in a pipeline, then we can use all the other mechanisms already available in sklearn, such as GridSearchCV to find best hyperparameters for the problem at hand.

That also brings are to the scorers. AIF36's scorers also don't fit into the scoring mechanism of sklearn. Once they do, we could use functions such as make_scorer to create a scoring function and feed it into the sklearn's GridSearchCV for instance.

This may not be a trivial task, and some useful things such as having multiple scoring functions recorded and reported by the grid search are still in discussion and not yet available in sklearn. Until then, we can provide an easy way for the users to combine antibias scoring functions with performance scoring ones and use them to choose their best pipeline.

Another point regarding the API conventions is that if the preprocessing modules also fit in the sklearn.Pipeline as a transformer, then we can put their selection also in a hyperparameter search and do the search much easier than having to manually run them one by one and going through them to find the best solution.

Right now transformers which would change the number of samples or change the output are not supported in sklearn (AFAIK), but that's also in discussion and this usecase may be a good push for it.

credit score tutorial

I would like to run this tutorial in Watson Studio. How can I access data files?
How can I change path of data files?

Pre-processing for adult, german and compas dataset

I am trying to use AIF360 tool in one of my projects. I have facing problem in understanding the purpose of pre-processing, say, german credit dataset as described in the file AIF360/aif360/algorithms/preprocessing/optim_preproc_helpers/data_preproc_functions.py.
Why is the custom processing described here needed in several algorithms like optimum pre-processing, meta classifier, reject classification. On the other hand, several algorithms do not require this for eg. adversarial debiasing, disparate impact remover, reweighing. Can you please help me understand the purpose?

Thank you.

Port in-processing algorithms to sklearn-compatible API

failed to pip install cvxpy

When I setup the environment, I meet the above error

Include fairness metrics for ranking, not just binary classifiers.

As suggested by @krvarshney I want to request inclusion of metrics for ranking algorithms e.g. as described here: https://arxiv.org/pdf/1706.06368.pdf.

fix deprecation warning in standard dataset

WARNING:root:Missing Data: 3620 rows removed from AdultDataset.
.../python3.7/site-packages/aif360/datasets/standard_dataset.py:121: FutureWarning: outer method for ufunc <ufunc 'equal'> is not implemented on pandas objects. Returning an ndarray, but in the future this will raise a 'NotImplementedError'. Consider explicitly converting the Series to an array with '.array' first.
priv = np.logical_or.reduce(np.equal.outer(vals, df[attr]))

base rate computation in Equalized odds post processing

Base rates for the privileged and unprivileged groups are not correctly estimated in the code.

https://github.com/IBM/AIF360/blob/0cf736f0310075a244dc63a045dbc7636dda2f2c/aif360/algorithms/postprocessing/eq_odds_postprocessing.py#L93

The classification metric of accuracy, precision, and recall differs from scikit-learn

import pandas as pd
import sys
import numpy as np
np.random.seed(0)
from aif360.datasets import StructuredDataset as SD
from aif360.datasets import BinaryLabelDataset as BLD
from aif360.metrics import ClassificationMetric as CM
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing
from sklearn.ensemble import RandomForestClassifier as RF
from sklearn.datasets import make_classification as mc 
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

data, label = mc(n_samples=10000,n_features=30)
bias_feature = label.copy()
np.random.shuffle(bias_feature)
agg_data = np.hstack([data,  bias_feature.reshape(-1,1), label.reshape(-1,1),])
pd_data = pd.DataFrame(agg_data, columns=list(range(1,31)) + ["gender", "labels"])
dataset = BLD(favorable_label=0, unfavorable_label=1,df=pd_data,
              label_names=["labels"], protected_attribute_names=["gender"], 
              privileged_protected_attributes=[2])
dataset_orig_train, dataset_orig_test = dataset.split([0.7], shuffle=True)
dataset_orig_test_pred = dataset_orig_test.copy(deepcopy=True)
privileged_groups = [{'gender': 0}]
unprivileged_groups = [{'gender': 1}]
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
clf = RF()
clf.fit(dataset_orig_train.features,dataset_orig_train.labels)

predictions = clf.predict(dataset_orig_test.features)
proba_predictions = clf.predict_proba(dataset_orig_test.features)

dataset_orig_test_pred.scores = proba_predictions[:,0].reshape(-1,1)
dataset_orig_test_pred.labels = predictions.reshape(-1, 1)

cm_pred_valid = CM(dataset_orig_test, dataset_orig_test_pred, unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)

cm = ["precision","recall", "accuracy"]


metrics = {}
for c in cm:
    metric = eval("cm_pred_valid." + c + "()")
    metrics[c] =  metric


metrics["recall"], metrics["accuracy"], metrics["precision"]


print("Scikit-learn metrics")
for key,value in {"recall": recall_score,"accuracy": accuracy_score, "precision": precision_score}.items():
    metric = value(dataset_orig_test.labels,predictions)
    print("{} score is: {}".format(key,metric))

print("AIF360 metrics")
for key in ["recall","accuracy", "precision"]:
    print("{} score is: {}".format(key,metrics[key]))

produces the following:

i.e. for scikit-learn

recall score is: 0.8780649436713055
accuracy score is: 0.8856666666666667
precision score is: 0.8928571428571429

and for AIF360

recall score is: 0.8933601609657947
accuracy score is: 0.8856666666666667
precision score is: 0.8786279683377308

Use OpenML and `sklearn.datasets.fetch_openml` for datasets.

Right now the datasets are expected to be manually downloaded by the user and put in the right place.

They can be uploaded to openml (https://www.openml.org/) and retrieved by sklearn.datasets.fetch_openml() to avoid having to have the user manually download the datasets.

Port post-processing algorithms to sklearn-compatible API

CalibratedEqualizedOdds
EqualizedOdds
RejectOptionClassifier

Missing dependencies: ModuleNotFoundError: No module named 'numba'

from aif360.datasets import GermanDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing

ModuleNotFoundError Traceback (most recent call last)
in
1 from aif360.datasets import GermanDataset
2 from aif360.metrics import BinaryLabelDatasetMetric
----> 3 from aif360.algorithms.preprocessing import Reweighing

~/.local/lib/python3.6/site-packages/aif360/algorithms/preprocessing/init.py in
1 from aif360.algorithms.preprocessing.disparate_impact_remover import DisparateImpactRemover
----> 2 from aif360.algorithms.preprocessing.lfr import LFR
3 from aif360.algorithms.preprocessing.optim_preproc import OptimPreproc
4 from aif360.algorithms.preprocessing.reweighing import Reweighing

~/.local/lib/python3.6/site-packages/aif360/algorithms/preprocessing/lfr.py in
3
4 from aif360.algorithms import Transformer
----> 5 from aif360.algorithms.preprocessing.lfr_helpers import helpers as lfr_helpers
6
7

~/.local/lib/python3.6/site-packages/aif360/algorithms/preprocessing/lfr_helpers/helpers.py in
1 # Based on code from https://github.com/zjelveh/learning-fair-representations
----> 2 from numba.decorators import jit
3 import numpy as np
4
5 @jit

ModuleNotFoundError: No module named 'numba'

Memory issues while opening StandardDataset

import pandas as pd
import sys
import numpy as np
np.random.seed(0)
from aif360.datasets import StructuredDataset as SD
from aif360.datasets import BinaryLabelDataset as BLD
from aif360.metrics import ClassificationMetric as CM
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing
from sklearn.ensemble import RandomForestClassifier as RF
from sklearn.datasets import make_classification as mc 
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

data, label = mc(n_samples=10000,n_features=30)
bias_feature = label.copy()
np.random.shuffle(bias_feature)
agg_data = np.hstack([data,  bias_feature.reshape(-1,1), label.reshape(-1,1),])
pd_data = pd.DataFrame(agg_data, columns=list(range(1,31)) + ["gender", "labels"])
dataset = BLD(favorable_label=0, unfavorable_label=1,df=pd_data,
              label_names=["labels"], protected_attribute_names=["gender"], 
              privileged_protected_attributes=[2])

running BLD(favorable_label=0, unfavorable_label=1,df=pd_data, label_names=["labels"], protected_attribute_names=["gender"], privileged_protected_attributes=[2])
in a python jupyter notebook 3 times runs in a memoryerror

Use sphinx-gallery and `.py` examples instead of ipynb

Using sphinx and sphinx-gallery we can generate docs and ipynb notebooks automatically from python example files.

This would greatly improve traceability of changes in the examples, and make can use them as a part of tests.

Incorrect index in LFR

I'm pretty sure it should be

dists = np.zeros((N, k))

as dists matches the distance between each data sample and prototype.

https://github.com/IBM/AIF360/blob/master/aif360/algorithms/preprocessing/lfr_helpers/helpers.py#L10-L17

Raw Probabilities for adversarial debiasing

Is there a way to retrieve the raw predicted probabilities (not labels, but probabilities that someone falls within a class)?

Subset of Dataset

I want to train in-processing algorithms like Prejudice Remover, ART classifier, etc. using a specific subset of dataset like Adult Income, etc. Is there a way to create a subset based on row numbers like df.loc[[2,3,4,5]]?
If so, I can then train and predict using the existing api which take 'dataset' type as input for fit & predict function respectively.
Not sure if this functionality is supported out of the box or if there is some workaround?

Include additional classification metrics

Include the following metrics:

Equalized odds difference: max(|FPR_unpriv - FPR_priv|, |TPR_unpriv - TPR_priv|)
Generalized equalized odds difference: max(|GFPR_unpriv - GFPR_priv|, |GTPR_unpriv - GTPR_priv|)
Generalized selection rate: mean score possibly conditioned by the group E[\hat{S}]

Upgrade tensorflow>=1.12.1

Upgrade tensorflow to version 1.12.1 or later for security fixes
Details
CVE-2019-9635 More information
moderate severity
Vulnerable versions: >= 1.0.0, < 1.12.1
Patched version: 1.12.1
NULL pointer dereference in Google TensorFlow before 1.12.2 could cause a denial of service via an invalid GIF file.

CVE-2018-7575 More information
critical severity
Vulnerable versions: >= 1.0.0, < 1.7.1
Patched version: 1.7.1
Google TensorFlow 1.7.x and earlier is affected by a Buffer Overflow vulnerability. The type of exploitation is context-dependent.

CVE-2018-7577 More information
high severity
Vulnerable versions: >= 1.1.0, < 1.7.1
Patched version: 1.7.1
Memcpy parameter overlap in Google Snappy library 1.1.4, as used in Google TensorFlow before 1.7.1, could result in a crash or read from other parts of process memory.

CVE-2018-10055 More information
high severity
Vulnerable versions: >= 1.1.0, < 1.7.1
Patched version: 1.7.1
Invalid memory access and/or a heap buffer overflow in the TensorFlow XLA compiler in Google TensorFlow before 1.7.1 could cause a crash or read from other parts of process memory via a crafted configuration file.

CVE-2018-7576 More information
moderate severity
Vulnerable versions: >= 1.0.0, < 1.6.0
Patched version: 1.6.0
Google TensorFlow 1.6.x and earlier is affected by: Null Pointer Dereference. The type of exploitation is: context-dependent.

Typo in the German Credit Scoring tutorial

In step 5 of the Credit Scoring notebook, there is a typo in the section below. The decimal place is shifted 1 to the left in this text above the cell: the difference in mean outcomes is now 0.18250. The value should instead by 0.018250 as shown below the cell.

Undefined name: CaldersVerwerTwoNaiveBayes in test_cv2nb.py

This failure only happens on Python 3 and is probably caused by the scoping difference in Python 2 and Python 3.

flake8 testing of https://github.com/IBM/AIF360 on Python 3.6

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./aif360/algorithms/inprocessing/kamfadm-2012ecmlpkdd/fadm/nb/tests/test_cv2nb.py:18:26: F821 undefined name 'CaldersVerwerTwoNaiveBayes'
        self.assertEqual(CaldersVerwerTwoNaiveBayes.N_CLASSES, 2)
                         ^
./aif360/algorithms/inprocessing/kamfadm-2012ecmlpkdd/fadm/nb/tests/test_cv2nb.py:19:26: F821 undefined name 'CaldersVerwerTwoNaiveBayes'
        self.assertEqual(CaldersVerwerTwoNaiveBayes.N_S_VALUES, 2)
                         ^
./aif360/algorithms/inprocessing/kamfadm-2012ecmlpkdd/fadm/nb/tests/test_cv2nb.py:20:13: F821 undefined name 'CaldersVerwerTwoNaiveBayes'
        m = CaldersVerwerTwoNaiveBayes(5, [2, 2, 2, 2, 3], 1.0, 0.8)
            ^
3    F821 undefined name 'CaldersVerwerTwoNaiveBayes'
3

AIF360 common_utils import issue

Hi,

I have installed aif360 with all dependencies.
But, unfortunately running into this error.

Thank you.

GerryFairClassifier Pre and Postprocessing Equivalent

Is there pre and post-processing techniques available with respect to rich subgroup fairness. And is there any studies comparing model-agnostic methods with model-specific methods? I just can't imagine none existing, I have looked everywhere. Thanks.

Possibly incorrect indentation of for loop

The for loop in line 125 should be outside the else part.
Otherwise for loop runs for the initializations done in else part but it should work with both, if and else initializations.

https://github.com/IBM/AIF360/blob/0cf736f0310075a244dc63a045dbc7636dda2f2c/aif360/algorithms/postprocessing/reject_option_classification.py#L116-L132

Remove gender recognition example

Automated Gender Recognition has been thoroughly shown to be fundamentally discriminatory against trans people. Given that, it's somewhat bizzarre to see a tutorial in how to make it "fairer" in a project aiming to reduce discriminatory outcomes from ML. I would really suggest (and appreciate it!) if the example was removed - it has no place here.

File: tutorial_medical_expenditure.ipynb Prejudice Remover not working on Python 2.7

Prejudice Remover does not work on python 2.7 because of subprocess.run which is not supported.

Unable to replicate MetaFairClassifier notebook

The predicted labels and scores in MetaFairClassifier need to be transposed to be (n, 1) instead of (1, n):
https://github.com/IBM/AIF360/blob/master/aif360/algorithms/inprocessing/meta_fair_classifier.py#L87-L88

I tried to fix this and the associated notebook but I was unable to replicate the results even after undo-ing the changes.

Python 3.6 failing

The latest PR failed for python 3.6 (but passed for 2.7). Here's the details: https://travis-ci.com/IBM/AIF360/jobs/151221128

Reconsider use of "bias" in README

There are some issues with the way "bias" is being used in the README that are both inconsistent with what is actually possible with the tool & the current state of social science.

"The AI Fairness 360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models."

While it's clear that we intend Bias in this context to meant "prejudice" or a disproportionate skew towards something, we must consider the larger impacts of this perspective. First off, it is impossible to not be prejudiced. In the social sciences, we call this "subjectivity". While some disciplines treat "objectivity" as an ideal, this concept is not transferable to human behavior, which is what we are modeling in our software applications. All humans have a distinct perspective, therefore all humans are prejudiced in some way.

The data & software applications that AIF360 seeks to "de-bias" were made by people based on their assumptions about how the world works & what they think matters. A software application is one group of people's collective opinion about what their stakeholders need. The structure of the software itself is based on these people's experience & is therefore prejudiced. For example, efforts like Gendermag & A11y are helping people working in open source projects address assumptions about how their stakeholders are using their software tools at a very fundamental level.

When training data is both curated & labeled, assumptions are made by the curators & it's clear from the description of the project that this is what AIF360 can help to address. However we must also acknowledge that AIF360 is also prejudiced by the assumptions of its creators & maintainers, & therefore can neither remove bias nor de-bias.

What is "fair" & "correct" is highly situational. What is "fair" in one situation may not be "fair" in another. In some social situations, such as in Law Enforcement, the lack of fairness is indicative of larger social issues & "de-biasing" could potentially further harm disadvantaged stakeholders who have been excluded from having their own voice. We should assume such deviations on "fairness" to be the norm, not an aberration. This assumption of a universal ideal norm (or bias if you will) exists within AIF360 itself.

I suggest we reconsider that bias & prejudice are tied to something inherent in the human condition & are therefore unavoidable. Instead of a definition that dictates our own biased "Truth" through the removal of any non-conforming perspectives, I propose we re-define bias as simply "limited perspective" & re-evaluate our language & explanations from that starting point.

This issue is quite significant because our framing of "bias" presents potential harms to IBM's own credibility & intention. IBM has historically had moments where we were very narrow & short-sighted in our contribution to software projects. The global scale of our Thought-Leadership & influence meant our mistakes significantly impacted people's lives. That said, for all the harms we've done, we achieve great things too! IBM has the advantage of having a much longer-term history than other tech companies to draw upon. Given the scale of our potential impact, we have a corporate social responsibility to thoughtfully consider who is at stake when we bring new ideas into the world.

"To visualize the future of IBM, you must know something of the past" -Thomas J. Watson, Sr.

IBM worked with the NYPD post-9/11 to develop surveillance software that lets police search by skin color https://theintercept.com/2018/09/06/nypd-surveillance-camera-skin-tone-search/

“Now, thanks to confidential corporate documents and interviews with many of the technologists involved in developing the software, The Intercept and the Investigative Fund have learned that IBM began developing this object identification technology using secret access to NYPD camera footage. With access to images of thousands of unknowing New Yorkers offered up by NYPD officials, as early as 2012, IBM was creating new search features that allow other police departments to search camera footage for images of people by hair color, facial hair, and skin tone.”

Documents - https://www.documentcloud.org/documents/4452844-IBM-SVS-Analytics-4-0-Plan-Update-for-NYPD-6.html

That time when IBM helped Indiana kick people off Welfare

“In November 2006, Indiana Governor Mitch Daniels announced a 10-year, $1.16 billion contract with a consortium of tech companies, led by IBM and Affiliated Computer Services (ACS), to modernize and privatize eligibility procedures for the state’s Medicaid, food-stamp, and cash-assistance programs.”

“The design of electronic-governance systems affects our material well-being, the health of our democracy, and equity in our communities. But somehow, when we talk about data-driven government, we conveniently omit the often terrible impacts that these systems have on the poor and working-class people”

IBM's punchcard system enabled the Nazis to efficiently round up Jews during WWII. “There was an IBM customer site in every concentration camp.”

(enhancement) pre/post/in processing classes within aif360.algorithms could take in key word arguments during instantiation

For example, refer to module aif360.preprocessing.optim_preproc.py

In order to create a OptimPreproc object, a user may have to pass in a bunch of arguments - that will be a mix of key word and positional arguments. In the below constructor, optiom_options is a dictionary but the rest is just positional.

class OptimPreproc(Transformer):
       def  __init__(self, optimizer, optim_options, unprivileged_groups=None,
               privileged_groups=None, verbose=False, seed=None):

It maybe better to simply pass in keyword arguments instead.

inverse_transform() not implemented for disparate impact remover

I am running the disparate impact demo. The output is encoded data and columns, and there isn't any inverse_transform function implemented in the code, as is the often the case, or maybe I am missing something ?

I would need to have the old 15-16 columns for post processing analysis, and try to avoid doing the recovery manually!

Documentation Typo for average_odds_difference()

Hi AIF360 team,

Thank you for the excellent repo & superb documentation!

I believe there is a typo for both:
https://aif360.readthedocs.io/en/latest/modules/metrics.html#aif360.metrics.ClassificationMetric.average_odds_difference

https://aif360.readthedocs.io/en/latest/modules/metrics.html#aif360.metrics.ClassificationMetric.average_abs_odds_difference

The implementation correctly returns

1/2[(FPRD=unprivileged−FPRD=privileged)+(TPRD=unprivileged - TPRD=privileged))]

vs the stated

1/2[(FPRD=unprivileged−FPRD=privileged)+(TPRD=privileged - TPRD=unprivileged))]

Slack channel link broken

The invitation link to the slack channel is currently broken.

Docstrings incorrect for generalized FN, FP & TN

The comments for generalized FN, FP, TN in aif360.metrics.classification_metric, the definitions all incorrectly include the weighted sum of predicted scores where true labels are 'favorable' and need to be updated to reflect the appropriate labels e.g.

        """Return the generalized number of false negatives, :math:\`GFN\`, the
        weighted sum of predicted scores where true labels are 'favorable',
        optionally conditioned on protected attributes.```

For some reason struggling with GerryFairClassifier

ImportError: cannot import name 'GerryFairClassifier'

Reproducible https://colab.research.google.com/drive/1G4w7egBd-t0PJ8L7mddlHIzXMxK1T-2A

The german credit dataset's privileged class should be Age > 25, not Age >= 25

https://github.com/IBM/AIF360/blob/c718f1d6cd11f1a7536a1317ea280abd961a7c79/aif360/datasets/german_dataset.py#L31

So I looked at the original paper on this dataset: F. Kamiran and T. Calders, “Classifying without discriminating,” They mentioned that 190 account holders were classified as young. This corresponds to classifying people <= 25 as Young. If set as < 25, it would remove about 50 instances from the Young group. This might be why the results on this dataset was unstable. Changing this should also improve the results on the website.

Port pre-processing algorithms to sklearn-compatible API

DisparateImpactRemover
LearnedFairRepresentation
OptimizedPreprocessing
Reweighing

make flake8 on Travis PR diff aware, and change line length

Right now, the .travis.cli checks for some flake8 errors with giving out warnings, with this line:

# exit-zero treats all errors as warnings.  The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

I know github's editor's max line is 127, but that's far too long for easily reviewing PRs on different systems and screens. I think it may make more sense to change that to 80 (what PEP8 recommends, with convincing arguments), and since we don't want to have to fix what already is in the repo, we can try and enforce that on new code.

scikit-learn for instance, uses a nice script which only fails only if there are new flake8 warnings introduced, which can be found here: https://github.com/scikit-learn/scikit-learn/blob/master/build_tools/travis/flake8_diff.sh

Support for MacOS

What OS was AIF360 built on? I'm attempting to run on a Macbook Pro with the latest Mac OS however I am getting errors when attempting to install the module 'cvxpy'. During the 'pip install cvxpy', I get numerous compilation errors:

  warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
  In file included from cvxpy/cvxcore/src/cvxcore.cpp:15:
  cvxpy/cvxcore/src/cvxcore.hpp:18:10: fatal error: 'vector' file not found
  #include <vector>
           ^~~~~~~~
  1 warning and 1 error generated.
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for cvxpy

Error getting UCI Adult dataset

The Travis build is failing due to the ftp to grab the UCI adult dataset not working. Details here: https://travis-ci.com/IBM/AIF360/jobs/150380518 I tried the failing wget command from a local cgywin shell on my laptop and it worked for me, so the problem doesn't seem to be on the UCI side.

Executing the Credit Risk notebook does not generate a de-biased dataset

Executing the Credit Risk Notebook does not generate a de-biased dataset. The results below are from a brand now GIT pull from the AIF360 repo. As shown at the end, the new "debiased" model now over twice as biased as the original model:

Unable to replicate demo_lfr.ipynb notebook

Whilst trying to replicate demo_lrf notebook, I am getting following error.
I also tried changes mentioned in issue 83.

/AIF360/aif360/algorithms/preprocessing/lfr_helpers/helpers.py:62: NumbaWarning: 
Compilation is falling back to object mode WITH looplifting enabled because Function "LFR_optim_obj" failed type inference due to: Unknown attribute 'iters' of type recursive(type(CPUDispatcher(<function LFR_optim_obj at 0x7f5fea38a2f0>)))

File "AIF360/aif360/algorithms/preprocessing/lfr_helpers/helpers.py", line 66:
def LFR_optim_obj(params, data_sensitive, data_nonsensitive, y_sensitive,
    <source elided>

    LFR_optim_obj.iters += 1
    ^

[1] During: typing of get attribute at /AIF360/aif360/algorithms/preprocessing/lfr_helpers/helpers.py (66)

File "AIF360/aif360/algorithms/preprocessing/lfr_helpers/helpers.py", line 66:
def LFR_optim_obj(params, data_sensitive, data_nonsensitive, y_sensitive,
    <source elided>

    LFR_optim_obj.iters += 1
    ^

@jit
/AIF360/aif360/algorithms/preprocessing/lfr_helpers/helpers.py:62: NumbaWarning: 
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "LFR_optim_obj" failed type inference due to: cannot determine Numba type of <class 'numba.dispatcher.LiftedLoop'>

File "AIF360/aif360/algorithms/preprocessing/lfr_helpers/helpers.py", line 85:
def LFR_optim_obj(params, data_sensitive, data_nonsensitive, y_sensitive,
    <source elided>
    L_z = 0.0
    for j in range(k):
    ^

@jit

Statistical_parity/disparate impact results without prediction?

Dear IBM,

How is it possible to get results for the metrics in the title (among other) without providing any prediction?

unused requirements

Why are there a bunch of packages in this repo that are not used anywhere in the codebase? Specifically:
Orange3
scs
networkx

Furthermore, there are packages only used in the notebooks and not in the library:
all the ipython stuff
lime
tqdm
matplotlib

NotImplementedError in standard dataset

Related to #109. Previously it warned Returning an ndarray, but in the future this will raise a 'NotImplementedError'. but now it returns the following error:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-3-3996a519ec26> in <module>()
      1 from aif360.datasets import AdultDataset
----> 2 data = AdultDataset()

/Users/staceyro/anaconda/envs/aif360/lib/python3.7/site-packages/aif360/datasets/adult_dataset.py in __init__(self, label_name, favorable_classes, protected_attribute_names, privileged_classes, instance_weights_name, categorical_features, features_to_keep, features_to_drop, na_values, custom_preprocessing, metadata)
    110             features_to_keep=features_to_keep,
    111             features_to_drop=features_to_drop, na_values=na_values,
--> 112             custom_preprocessing=custom_preprocessing, metadata=metadata)

/Users/staceyro/anaconda/envs/aif360/lib/python3.7/site-packages/aif360/datasets/standard_dataset.py in __init__(self, df, label_name, favorable_classes, protected_attribute_names, privileged_classes, instance_weights_name, scores_name, categorical_features, features_to_keep, features_to_drop, na_values, custom_preprocessing, metadata)
    119             else:
    120                 # find all instances which match any of the attribute values
--> 121                 priv = np.logical_or.reduce(np.equal.outer(vals, df[attr]))
    122                 df.loc[priv, attr] = privileged_values[0]
    123                 df.loc[~priv, attr] = unprivileged_values[0]

/Users/staceyro/anaconda/envs/aif360/lib/python3.7/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    703             return None
    704         else:
--> 705             return construct_return(result)
    706 
    707     def __array__(self, dtype=None) -> np.ndarray:

/Users/staceyro/anaconda/envs/aif360/lib/python3.7/site-packages/pandas/core/series.py in construct_return(result)
    692                 if method == "outer":
    693                     # GH#27198
--> 694                     raise NotImplementedError
    695                 return result
    696             return self._constructor(result, index=index, name=name, copy=False)

NotImplementedError:

Python: 3.7.2
Pandas: 1.0.3
AIF360: 0.2.3 (built from source)

To replicate:

import aif360
from aif360.datasets import AdultDataset
data = AdultDataset()

AIF360 calculations are off

Hi all,

I am checking the values that AIF360 spits out. I have noticed that these values seem off. I have calculated them on my own using the lines below, where protected and unprotected are lists of zero and one.

sum(protected)/len(protected) = 0.7643312101910829
sum(unprotected)/len(unprotected) = 0.1386481802426343

Disparate impact by AIF360: 3.6549252892407136
Disparate impact by me: 5.512738853503185

Statistical parity difference by AIF360: 0.6256830299484484
Statistical parity difference by me: 0.6256830299484486

What especially strikes me is the difference between the disparate impact "error" and the statistical parity difference "error" as they are using practically the same numbers. Does anyone encoutered this issue before?