Giter Site home page Giter Site logo

ai_bias_fairness's Introduction

AI bias and fairness in MATLAB

Open in MATLAB Online

Created with R2022b. Compatible with R2022b and later releases

What is AI bias and fairness?

As AI is adopted in many industries, it started to impact daily lives with real life consequences, such as finance, education, employment, law enforcement, etc. As the result the issue of the bias has become frequent headlines in mainstream news. Typically, bias is used in AI in the context of the bias-variance tradeoff, but in this case, we are talking about more conventional sense that the system is biased against some protected class, such as race, gender, religion, age, disability, national origin, marital status, genetic information, etc.

Where does the AI bias come from?

Bias can creep into our AI models in every step of the way โ€“ from the data itself, in modeling, and in human review. For example, data we use may have overrepresented one class over other classes, or uses features that reflect our unconscious bias. Our hidden bias also affects how we choose to model the data, and evaluate the output from AI. Because we all have bias that we are not aware of, it is important to have diverse perspectives in your team as you collect data, develop models, and evaluate the outputs in order to catch our unconscious bias.

How can we reduce AI bias?

How to ensure our AI models produce fairer and more equitable outcomes? This is still an active area of AI research and there is no perfect solutions yet. However, we do have some workable solutions that we can apply to very common binary classification problems, and they are available in Statistics and Machine Learning Toolbox in R2022b.

Define AI fairness

We can begin by defining AI fairness: if a model changes output based on sensitive attributes (i.e., race, gender, age, etc.), then it is biased; otherwise, it is fair.

Simply removing sensitive characteristics from the dataset doesn't work because bias can be hidden in other predictors (i.e. zip code may correlate to race), and bias can creep into model as class imbalances in the training dataset as well during the training. Ideally, you want to

  • Data-level: evaluate the bias and fairness of the dataset before you begin the rest of the process
  • Model-level: evaluate the bias and fairness of the predictions from the trained model

Statistical Parity Difference (SPD), and Disparate Impact (DI), can be used for both, while Equal Opportunity Difference (EOD), and Average Absolute Odds Difference (AAOD) are meant for evaluating model predictions.

Let's try SPD on the built-in dataset patients.

load patients
Gender = categorical(Gender);
Smoker = categorical(Smoker,logical([1 0]),["Smoker","Nonsmoker"]);
tbl = table(Diastolic,Gender,Smoker,Systolic);

We need to split the data into training set and test set and just use the training set.

rng('default') % For reproducibility
cv = cvpartition(height(tbl),'HoldOut',0.3);
xTrain = tbl(training(cv),:);
xTest = tbl(test(cv),1:4);

Then use the training set to calculate the metrics. In this case, the positive class is 'nonsmoker' and SPD needs to be close to 0 in order for the dataset to be fair.

$SPD = p(Y = Nonsmoker|Gender = Male) - p(Y = Nonsmoker|Gender = Female) \approx 0$

metrics = fairnessMetrics(xTrain,"Smoker",SensitiveAttributeNames="Gender");
metrics.PositiveClass

Positive Class

report(metrics,BiasMetrics="StatisticalParityDifference")

SPD Table

This data-level evaluation shows that dataset is biased in favor of female nonsmoker than male nonsmoker.

Mitigate bias

Once we have ways to evaluate our dataset or model for bias and fairness, we can then use such metrics to mitigate the problem we find. Class imbalance is an issue for machine learning for a long time and many classifiers accepts weights (costs) to address this problem and it is easier to understand.

Going back to the earlier example, let's calculate fairness weights and check the summary statistics.

fairWeights = fairnessWeights(xTrain,"Gender","Smoker");
xTrain.Weights = fairWeights;
groupsummary(xTrain,["Gender","Smoker"],"mean","Weights")

Reweighting

In this dataset, female nonsmoker and male smoker are overrepresented and fairness weights discount these subgroups, while boosting other underrepresented subgroups. When we apply the weights to SPD calculation, you see that the results are much closer to 0.

weightedMetrics = fairnessMetrics(xTrain,"Smoker",SensitiveAttributeNames="Gender",Weights="Weights");
figure
tiledlayout(2,1);
nexttile
plot(metrics,"StatisticalParityDifference")
title("Before reweighting")
xlabel("Statistical Parity Difference")
xl = xlim;
nexttile
plot(weightedMetrics,"StatisticalParityDifference")
title("After reweighting")
xlabel("Statistical Parity Difference")
xlim(xl);

Fairer Model

The metrics shows that the trained model is closer to 0 in SPD than the training dataset.

Closing

This was a quick introduction to the new AI bias and fairness features introduced in Statistics and Machine Learning Toolbox in R2022b.

Copyright 2023 The MathWorks, Inc.

ai_bias_fairness's People

Contributors

toshiakit avatar

Stargazers

Hans Scharler avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.