Giter Site home page Giter Site logo

zhouxing19 / cs224nfinalproject Goto Github PK

View Code? Open in Web Editor NEW

This project forked from traviemcg/cs224nfinalproject

0.0 0.0 0.0 5.4 MB

Final project for Stanford's NLP class CS224N Winter 2020

Python 96.11% Shell 0.11% Makefile 0.03% Dockerfile 0.10% CSS 0.16% JavaScript 0.45% Jupyter Notebook 3.05%

cs224nfinalproject's Introduction

Importance of Depth and Fine-tuning in BERT-models

Ths project started in CS224N: Natural Language Processing with Deep Learning at Stanford University. The goal of this project is to better understand how transformer based pretrained natural language representations hierarchically represent information through the use of softmax regression probes.

Our paper is here FORTHCOMING. This repository will walk through all steps necessary to reproduce the results.

There are three major components to this respository:

Table Contents

  1. Setting up
  2. Using transformers models
  3. Probes
  4. Reproducing results

Setting up

(OPTIONAL) General conda preperation:

conda update conda
conda update --all
conda info # verify platform is 64 bit
curl https://sh.rustup.rs -sSf | sh # only on mac os

Create a conda environment with the necessary packages, where pytorch may vary pending system but is at pytorch.org.

conda create -n transformers python=3.7
conda activate transformers
pip3 install --upgrade pip tensorflow
conda install pytorch torchvision -c pytorch pandas

Then install the revision of the 'Transformers' package associated with this library.

cd transformers-master
pip3 install .

(OPTIONAL) Some useful tmux commands:

tmux ls
tmux new -s session_name
tmux a -t session_name
tmux detach

Using transformers models

Community models

First, be sure you have downloaded train-v2.0.json and dev-v2.0.json to squad2 as specified in the README Then, move into the transformer-master directory.

cd transformers-master/examples

First, use a community trained ALBERT xxlarge_v1 fine-tuned

export SQUAD_DIR=../../squad2/
python3 run_squad.py 
    --model_type albert 
    --model_name_or_path ahotrod/albert_xxlargev1_squad2_512 
    --do_eval 
    --do_lower_case 
    --version_2_with_negative 
    --predict_file $SQUAD_DIR/dev-v2.0.json 
    --max_seq_length 384 --doc_stride 128 
    --output_dir ./tmp/albert_xxlarge_fine/
Model Exact F1 Exact Has Ans F1 Has Ans Exact No Ans F1 No Ans
ALBERT v1 XXLarge 85.32 88.84 82.61 89.95 87.82 87.82

Our models

At various times, we will want to reference models by their prefix in the transformers library, so a table is provided. The pretrained models were created and shared by the Huggingface team (creators of the transformers library), while the fine-tuned models were trained and shared by us. The exact Python commands used to train each model, along with more detailed model performance, is included on each of the linked model cards.

Model Model Prefix
ALBERT Pretrained albert-base-v2
ALBERT Fine-tuned twmkn9/albert-base-v2-squad2
BERT Pretrained bert-base-uncased
BERT Fine-tuned twmkn9/bert-base-uncased-squad2
DistilBERT Pretrained distilbert-base-uncased
DistilBERT Fine-tuned twmkn9/distilbert-base-uncased-squad2
DistilRoberta Pretrained distilroberta-base
DistilRoberta Fine-tuned twmkn9/distilroberta-base-squad2
Model Exact F1 Exact Has Ans F1 Has Ans Exact No Ans F1 No Ans
BERT Fine-tuned 72.36 75.75 74.30 81.38 70.58 70.58
ALBERT Fine-tuned 78.71 81.89 75.40 82.04 81.76 81.76
DistilBERT Fine-tuned 64.89 68.18 69.76 76.63 60.42 60.42
DistilRoberta Fine-tuned 70.93 74.60 67.63 75.30 73.96 73.96

Probes

Probe training

python3 train.py [model_prefix] [cpu/gpu] [epochs]

To train probes for each layer of ALBERT Pretrained on the cpu for 1 epoch (e.g. for debugging locally):

python3 train.py albert-base-v2 cpu 1

To train probes for each layer of ALBERT Fine-tuned on the gpu for 3 epoch (e.g. on a vm):

python3 train.py twmkn9/albert-base-v2-squad2 gpu 3

By default, probes will be saved for each epoch. If one is only interested in probes at a certain epoch, they can simply delete the unwanted intermediate epoch directories.

Probe prediction

python3 predict.py [model_prefix] [cpu/gpu]

To make predictions for probes at each layer and each epoch for BERT Pretrained on the cpu:

python3 predict.py bert-base-uncased cpu

Probe evaluation

python3 evaluate.py [model_prefix]

To evaluate predictions for probes at each layer and each epoch for BERT Fine-tuned:

python3 evaluate.py twmkn9/bert-base-uncased-squad2

Reproducing results

FORTHCOMING

cs224nfinalproject's People

Contributors

traviemcg avatar pranav-bhardwaj8 avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.