Giter Site home page Giter Site logo

manankshastri / wordvectorrepresentation Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 1.0 615 KB

Operations on word vectors - measure similarity using cosine similarity, solve word analogy problems, reduce gender bias in word embedding.

Jupyter Notebook 81.96% Python 18.04%
python3 cosine-similarity gender-bias word-analogy glove-vectors

wordvectorrepresentation's Introduction

Operations on word vectors

- Load pre-trained word vectors, and measure similarity using cosine similarity.

- Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______.

- Modify word embeddings to reduce their gender bias


We will use 50-dimensional GloVe vectors to represent words.


1 - Cosine similarity

To measure how similar two words are, we need a way to measure the degree of similarity between two embedding vectors for the two words.

This similarity depends on the angle between u and v. If u and v are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value.

Figure 1 : The cosine of the angle between two vectors is a measure of how similar they are.

2 - Word analogy task

In the word analogy task, we complete the sentence "a is to b as c is to ____". An example is 'man is to woman as king is to queen'. In detail, we are trying to find a word d, such that the associated word vectors are related. We will measure the similarity between e_b - e_a and e_d - e_c using cosine similarity.

3 - Debiasing word vectors

We will examine gender biases that can be reflected in a word embedding, and explore algorithms for reducing the bias.

3.1 - Neutralize bias for non-gender specific words

The figure below should help you visualize what neutralizing does. If you're using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction g, and the remaining 49 dimensions, which we'll call gāŠ„.

Figure 2: The word vector for "receptionist" represented before and after applying the neutralize operation.

3.2 - Equalization algorithm for gender-specific words

Next, lets see how debiasing can also be applied to word pairs such as "actress" and "actor." Equalization is applied to pairs of words that you might want to have differ only through the gender property. As a concrete example, suppose that "actress" is closer to "babysit" than "actor." By applying neutralizing to "babysit" we can reduce the gender-stereotype associated with babysitting. But this still does not guarantee that "actor" and "actress" are equidistant from "babysit." The equalization algorithm takes care of this.

The key idea behind equalization is to make sure that a particular pair of words are equi-distant from the 49-dimensional gāŠ„. The equalization step also ensures that the two equalized steps are now the same distance from ereceptionistdebiased, or from any other work that has been neutralized. In pictures, this is how equalization works:


References:

The debiasing algorithm is from Bolukbasi et al., 2016, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.

The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning.

wordvectorrepresentation's People

Contributors

manankshastri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

swati1-ud

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.