Giter Site home page Giter Site logo

hurtfulwords's Introduction

Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings

Paper

If you use this code in your research, please cite the following publication:

Haoran Zhang, Amy X. Lu, Mohamed Abdalla, Matthew McDermott, and Marzyeh Ghassemi. 2020.
Hurtful words: quantifying biases in clinical contextual word embeddings.
In Proceedings of the ACM Conference on Health, Inference, and Learning (CHIL ’20).
Association for Computing Machinery, New York, NY, USA, 110–120.

A publically available version of this paper is also on arXiv.

Pretrained Models

The pretrained BERT models used in our experiments are available to download here:

Step 0: Environment and Prerequisites

  • Before starting, go to the MIMIC-benchmarks repository, and follow all of the steps in the Building a benchmark section.
  • Run the following commands to clone this repo and create the Conda environment
git clone https://github.com/MLforHealth/HurtfulWords.git
cd HurtfulWords/
conda create -y -n hurtfulwords python=3.7
conda activate hurtfulwords
pip install -r requirements.txt

Step 1: Data processing

Reads in the tables from MIMIC and pregenerates data for clinical BERT pretraining. Reads in the cohorts defined by MIMIC-benchmarks and creates tasks for finetuning on downstream targets.

  • In bash_scripts/data_processing_pipeline.sh, update BASE_DIR, OUTPUT_DIR, SCIBERT_DIR and MIMIC_BENCHMARK_DIR.
  • In scripts/get_data.py, update the database connection credentials on line 13. If your MIMIC-III is not loaded into a database, you will have to update this script accordingly.
  • Run bash_scripts/data_processing_pipeline.sh. This script will require at least 50 GB of RAM, 100 GB of disk space in OUTPUT_DIR, and will take several days to complete.

Step 2: Training Baseline Clinical BERT

Pretrains baseline clinical BERT (initialized from SciBERT) for 1 epoch on sequences of length 128, then 1 epoch on sequences of length 512.

  • In bash_scripts/train_baseline_clinical_BERT.sh, update BASE_DIR, OUTPUT_DIR, and SCIBERT_DIR. These variables should have the same values as in step 1.
  • Run bash_scripts/train_baseline_clinical_BERT.sh on a GPU cluster. The resultant model will be saved in ${OUTPUT_DIR}/models/baseline_clinical_BERT_1_epoch_512/.

Step 3: Training Adversarial Clinical BERT

Pretrains clinical BERT (initialized from SciBERT) with adversarial debiasing using gender as the protected attribute, for 1 epoch on sequences of length 128, then 1 epoch on sequences of length 512.

  • In bash_scripts/train_adv_clinical_bert.sh, update BASE_DIR, OUTPUT_DIR, and SCIBERT_DIR. These variables should have the same values as in step 1.
  • Run bash_scripts/train_adv_clinical_bert.sh gender on a GPU cluster. The resultant model will be saved in ${OUTPUT_DIR}/models/adv_clinical_BERT_gender_1_epoch_512/.

Step 4: Finetuning on Downstream Tasks

Generates static BERT representations for the downstream tasks created in Step 1. Trains various neural networks (grid searching over hyperparameters) on these tasks.

  • In bash_scripts/pregen_embs.sh, update BASE_DIR and OUTPUT_DIR. Run this script on a GPU cluster.
  • In bash_scripts/finetune_on_target.sh, update BASE_DIR and OUTPUT_DIR. This script will output a trained model for a particular (target, model) combination, in the ${OUTPUT_DIR}/models/finetuned/ folder. The Python script bash_scripts/run_clinical_targets.py will queue up the 114 total (target, model) experiments conducted, as Slurm jobs. This script will have to be modified accordingly for other systems.

Step 5: Analyze Downstream Task Results

Evalutes test-set predictions of the trained models, by generating various fairness metrics.

  • In bash_scripts/analyze_results.sh, update BASE_DIR and OUTPUT_DIR. Run this script, which will output a .xlsx file containing fairness metrics to each of the finetuned model folders.
  • The Jupyter Notebook notebooks/MergeResults.ipynb will read in each of the generated metrics files which can then be viewed in the notebook.

Step 6: Log Probabiltiy Bias Scores

Following procedures in Kurita et al., we calculate the 'log probability bias score' to evaluate biases in the BERT model. Template sentences should be in the example format provided by fill_in_blanks_examples/templates.txt. A CSV file denoting context key words and the context category should alshould also be suppled (see fill_in_blanks_examples/attributes.csv).

This step can be done independently of steps 4 and 5.

  • In bash_scripts/log_probability.sh, update BASE_DIR, OUTPUT_DIR, and MODEL_NAME. Run this script.
  • The statistical significance results can be found in ${OUTPUT_DIR}/${MODEL_NAME}_log_scores.tsv.
  • The notebook notebooks/GetBasePrevs.ipynb computes the base prevalences for categories in the notes.

Step 7: Sentence Completion

scripts/predict_missing.py takes template sentences which contain _ for tokens to be predicted. Template sentences can be specified directly in the script.

This step can be done independently of steps 1-6.

  • In scripts/predict_missing.py, update SCIBERT_DIR. Run this script in the Conda environment. The results will be printed to the screen.

hurtfulwords's People

Contributors

hzhang0 avatar amyxlu avatar dependabot[bot] avatar

Stargazers

Daniel Anadria avatar aakanksha avatar  avatar Danielly Sorato avatar Nari Yoo avatar Rafał Kłeczek avatar Aaditya Ura (looking for PhD Fall’24) avatar 21. avatar Batuhan Bardak avatar Valentin avatar László Király avatar Anthony Rios avatar Jeffrey Tumminia avatar  avatar Michael Corrado avatar Darius Irani avatar Ahsan avatar Madhumita Sushil avatar Simon Šuster avatar  avatar

Watchers

James Cloos avatar Marzyeh Ghassemi avatar kokoysk avatar  avatar  avatar Asher Moldwin avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.