Giter Site home page Giter Site logo

tolga-b / debiaswe Goto Github PK

View Code? Open in Web Editor NEW
241.0 8.0 87.0 60 KB

Remove problematic gender bias from word embeddings.

Home Page: https://arxiv.org/abs/1607.06520

License: MIT License

Python 18.28% Jupyter Notebook 81.72%
word2vec word-embeddings gender-equality social-justice nips-2016 debias

debiaswe's Introduction

Debiaswe: try to make word embeddings less sexist

๐Ÿ”ดFAT* 2018 tutorial slides

Here we have the code and data for the following paper: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings by Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Proceedings of NIPS 2016.

Just looking to download a debiased embedding?

You can download binary/txt hard debiased version of the Google's Word2Vec embedding trained on Google News (Origin: GoogleNews-vectors-negative300.bin.gz found here).

Python scripts:

  • learn_gender_specific.py: given a word embedding and a seed set of gender-specific words (like king, she, etc.), it learns a much larger list of gender-specific words
  • debias.py: given a word embedding, sets of gender-pairs, gender-specific words, and pairs to equalize, it outputs a new word embedding. This version basically reads/writes word2vec binary file format.
python learn_gender_specific.py ../embeddings/GoogleNews-vectors-negative300.bin 50000 ../data/gender_specific_seed.json gender_specific_full.json
python debias.py ../embeddings/GoogleNews-vectors-negative300.bin ../data/definitional_pairs.json ../data/gender_specific_full.json ../data/equalize_pairs.json ../embeddings/GoogleNews-vectors-negative300-hard-debiased.bin

We also have seed data used to debias and crowd data used to evaluate the embeddings.

Data files:

  • gender_specific_seed.json: A list of 218 gender-specific words
  • gender_specific_full.json: A list of 1441 gender-specific words
  • definitional_pairs.json: The ten pairs of words we use to define the gender direction
  • equalize_pairs.json: Some crowdsourced F-M pairs of words that represent gender direction

(All external files that I refer within this repo can be found in this folder.)

debiaswe's People

Contributors

akalai avatar ciniesta avatar tolga-b avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

debiaswe's Issues

Cosine similarity: Top pca component vs she-he

Hello Debiaswe Research Team, Thank you for making the code related to your paper available. This is very helpful! I am writing to seek clarification on analyzing gender bias in word vectors associated with professions.

In your paper, you suggest using cosine similarity between a given profession vector and the top PCA component. I am trying to replicate the same in the wiki context. Unfortunately, I am getting results opposite to expected.

For example, when I compute the cosine similarity between the waitress vector (or nurse vector) and the top gender principal component, I get a -ve score. However, when I compute the cosine similarity between the same profession vector and she - he vector (as you show in the example here), I get a +ve score.

I am confused about why the sign flips when using PCA and straightforward gender vector. I request your help.

Thank you!
sbs

Soft Debiasing

Hi, I was wondering if you could release the code related to soft debiasing? It would be nice to see the comparison between soft and hard debiasing.

Indirect Bias

I was looking in the code and wondered if it handled the indirect bias. So I wonder if it really exists there and I just can not get it.

Another thing, in the python tutorial notebook, computed the professions with respect to the projection score,as
sp = sorted([(E.v(w).dot(v_gender), w) for w in profession_words])
so I wonder if this is the same like computing the direct bias.

Debiasing racial bias

Hi, I was wondering if you're planning on release a set of debiased embeddings which also address racial bias? You mention it in the related paper that it exists but you don't address it here. Will there be a follow-up work on this. It would be good to have pretrained word embeddings which address this as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.