Giter Site home page Giter Site logo

montaserfath / fair-classifier-and-fair-representation Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 3.09 MB

1) Build a fair classifier which no-one will be able to predict the sensitive attribute from the model. 2) Build a Fair representation of the data

License: MIT License

Jupyter Notebook 100.00%
fairness-ml adults

fair-classifier-and-fair-representation's Introduction

Fair Classifier and Fair Representation

The main objective is:

  • Build a fair classifier which no-one will be able to predict the sensitive attribute from the model.
  • Build a Fair representation of the data.
  • A set of reasonably clean records was extracted.
  • Prediction task is to determine whether a person makes over 50K a year.
  • Sensitive Attribute: Gender male or female.
  • Features: 113 Features, like age, Gender, race, education level, work class, hours per week, country , etc.
  • Number of training samples: 32,561 samples and the Number of testing samples: 16,281 samples.

Data Visualization:

  • The data-set is Unbalanced, Number of males is more larger than the number of females. also, number of people who has less than 50K per year is more larger than the number people who get more thank 50K per Year
Number of males is more larger than the number of females number of people who has less than 50K per year is more larger than who get more thank 50K per Year.

Accuracy Functions:

  • Accuracy (A):

  • Reweighted accuracy (R) of a classifier as the mean accuracy normalized by the size of the two group:

  • Fairness metric DP , which measures the demographic parity (DP):

Part One:Classification

Most correlated Features to Income (Y) and Gender (A):

  • I used to methods to calculate crrelation between features and Income (Y) and crrelation between features and Income (Y):
  • 1- pearson: the Pearson correlation coefficient measures the linear relationship between two datasets (x,y). Pearson’s correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
Top correlated Features to Income (Y) using Pearson Top correlated Features to Gender (A) using Pearson
  • 2- Tress: Use tress to return the feature importance (the higher, the more important the feature).
Top correlated Features to Income (Y) using Tress Top correlated Features to Gender (A) using Tress

Binary Classifier:

I used the following methods to train a Binary Classifier to predict Income (Y) and Gender (A):

  • Binary Logistic Regression.
  • Random Forest.
  • Linear Support Vector machine (SVM).
  • Kernal SVC.

Results:

  • Compare different methods to predict Income (Y)
Method Accuracy (A) Reweighted accuracy (R) DP Accuracy
Binary Logistic Regression 79.5% 81.92% 0.04
Random Forest 85.79% 87.6% 0.16
Linear SVM 85.65% 87.48% 0.17
SVC 86.59% 88.29% 0.16
  • Compare different methods to predict Gender (A)
Method Accuracy (A) Reweighted accuracy (R) DP Accuracy
Binary Logistic Regression 66.7% 50% 0.00
Random Forest 84.47% 81.91% 0.64
Linear SVM 84.18% 83.46% 0.67
SVC 82.19% 81.71% 0.63

Bart Two: Representation Learning

pre-processing

  • In this section, we study the effect of pre-processing in representation, so Data-set split into two part: Male data (A=1) and Females data (A=0), After that, we calculate the mean and stander deviation for each part and normalize each part.

  • Train the model after normalized each group (males and females) topredict Gender (A)

Method Accuracy (A) Reweighted accuracy (R) DP Accuracy
Binary Logistic Regression 63.41% 48.57% 0.03
Random Forest 66.7% 50% 0.00
Linear SVM 37.74% 51.61% 0.03
SVC 66.7% 50% 0.00

Nural Network

  • I used Neural Network as a classifier, So, I used MMD loss with Binary cross-entropy as Loss function as following:

Total Loss = Binary cross entropy loss + alpha * MMD Loss

  • Where alpha is a hyperparameter decided how much MMD loss waited comparing to Binary cross entropy loss
Alpha Accuracy (A) Reweighted accuracy (R) DP Accuracy
alpha=0.0 85.6% 86.84% 0.19
alpha=0.1 85.14% 86.97% 0.18
alpha=0.2 85.39% 87.15% 0.18
alpha=0.5 85.6% 87.38% 0.16
alpha=0.7 85.36% 87.14% 0.18
alpha=1.0 85.25% 87.03% 0.19
alpha=10.0 83.66% 84.66% 0.06
alpha=100.0 79.06% 81.7% 0.04

Installing

Install torchvision

pip install torchvision

Install pandas libary

pip install pandas

Install sklearn libary

pip install sklearn

Install scipy libary

pip install scipy

fair-classifier-and-fair-representation's People

Contributors

montaserfath avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.