Giter Site home page Giter Site logo

vanessaschenkel / mitigating-gender-bias-wmt-2020 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from artursstaf/mitigating-gender-bias-wmt-2020

0.0 0.0 0.0 12.16 MB

A project that explores using word grammatical genders as an additional source of information in neural machine translation.

Shell 1.81% Python 0.71% Kotlin 0.10% Jupyter Notebook 2.76% Ruby 94.62%

mitigating-gender-bias-wmt-2020's Introduction

Mitigating Gender Bias in Machine Translation: Target Language Grammatical Gender Projections Onto Source Language

Repository contains code and partial data for experiments described in Mitigating Gender Bias in Machine Translation: Target Language Grammatical Gender Projections Onto Source Language

This research has been supported by the European Regional Development Fund within the joint project of SIA TILDE and University of Latvia “Multilingual Artificial Intelligence Based Human Computer Interaction” No. 1.1.1.1/18/A/148.

Requirements

Conda is recommended way to run experiments conda create -n gender-bias python=3.7.
Also make sure you have system-wide dependencies sudo apt install build-essential swig python-dev libgoogle-perftools-dev libsparsehash-dev.
Then switch into conda environment and install necessary tools via scripts/install_tools.sh.

Running experiments

Experiments are organized per language pair (training corpora).
Running bash scripts in order from scripts/{language}/*.sh will prepare data, train model and evaluate BLEU and WinoMT scores. Experiments for latvian_imba (large proprietary Tilde corpora) are not reproducible.
Each language pair trains 2 NMT systems baseline(base) with no TGA and gendered(genders2) with TGA in training data.

Evaluation results

Evaluation metrics are aggregated in evaluation_logs/{languate}/{experiment}/.
WinoMT test set translations are stored in data/wino_mt/{langage}/{experiment}.
Newstest translations can be found in data/dev_translations/{language}/{experiment}.

Scripts

Paper-specific data preparation scripts can be found in scripts/python. Example usage can be found in scripts/common/ where these scripts are invoked.

  • generate_genders.py extracts gender annotations (M/F/N/U) using Stanza tagger
  • align_genders.py projects target gender annotations onto source side tokens
  • genders_bpe.py copy word level gender annotations to their respective sub-word parts
  • randomly_include_genders.py applies dropout to TGA
  • wino_mt_genders.py extract gold gender annotations from WinoMT dataset
  • wino_mt_genders_allen.py generate gender annotations using AllenNLP coreference resolution tool

mitigating-gender-bias-wmt-2020's People

Contributors

artursstaf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.