Giter Site home page Giter Site logo

ad-kd's Introduction

AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression (ACL 2023)

Installation

To install the environment, run:

pip install -r requirements.txt

Download GLUE Data

Download the GLUE data using this repository or from GLUE benchmark website, unpack it to directory datas/glue and rename the folder CoLA to COLA.

Download Pre-trained BERT

Download bert_uncased_L-12_H-768_A-12 (BERT-base) and bert_uncased_L-6_H-768_A-12 for teacher model and student model, respectively, from this repository. and use the API from Huggingface to transform them to pytorch checkpoint.

Task-specific Teacher Model Training

We provide training script for each task in script/teacher/, where the $TEACHER_PATH is the path of teacher model.

Task-specific Student Model Distillation

AD-KD can be run on single-GPU or multi-GPU, but make sure to use DistributedDataParallel instead of DataParallel in Pytorch when using multi-GPU. Here we provide the scripts with single-GPU in script/student/, where the $TEACHER_PATH and $STUDENT_PATH are the path of teacher model and student model, respectively.

Student Checkpoints

The distilled student model for each task reported in the paper can be downloaded as follows:

from transformers import BertForSequenceClassification
task_name = 'cola' # task name with lower case
model = BertForSequenceClassification.from_pretrained("Brucewsy/AD-KD_bert_uncased_L-6_H-768_A-12_" + task_name)

ad-kd's People

Contributors

brucewsy avatar

Stargazers

 avatar homzer avatar Liang Jiehao avatar Heegon Jin avatar Sun avatar Haotian Wang avatar Liang Yunhao avatar XiangyuYang avatar Jeff Carpenter avatar

Watchers

 avatar

Forkers

wslucy 2251821381

ad-kd's Issues

Student Checkpoints

Hello, could you please advise me on how to obtain the reported scores for the qnli and qqp task? I utilized the pre-trained model available at https://huggingface.co/Brucewsy/AD-KD_bert_uncased_L-6_H-768_A-12_qnli and https://huggingface.co/Brucewsy/AD-KD_bert_uncased_L-6_H-768_A-12_qqp, but the resulting accuracy score on the development set was only 83.8 and 81.5. Could you kindly confirm if this Hugging Face model is the state-of-the-art model for this task? Additionally, are there any specific details that should be taken into consideration when testing these models?

Teacher Checkpoints

Thanks for your contribution.

Can you provide teacher checkpoints, like you did for student checkpoints?
The teacher models trained with your script does not match the performance (quite different) reported in the manuscript somehow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.