Giter Site home page Giter Site logo

tom68-ll / sapt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from circle-hit/sapt

0.0 0.0 0.0 28.38 MB

Code for ACL 2024 accepted paper titled "SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models"

Shell 0.21% Python 99.79%

sapt's Introduction

SAPT

The official implementation for the ACL 2024 paper SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models.

venue status

Requirements

  • Python 3.10.12
  • PyTorch 2.1.0
  • Transformers 4.30.2
  • CUDA 12.2

Preparation

The train/dev/test data from SuperNI and Long Sequence Benchmark is placed in /CL_Benchmark.

And the generated pseudo data points are in /generated_data.

Training

First run gen_script_{benchmark}_{model}.py to obtain the training script.

For example, to implement T5 model on the SuperNI benchmark:

python gen_script_superni_t5.py

Then run the resulting script to start the training process.

Evaluation

To calculate metrics of Average Performance (AP), Forgetting Rate (F.Ra), Forward Transfer (FWT) and Backward Transfer (BWT):

python score.py your_result_path single_result_path 

Citation

If you find our work useful for your research, please kindly cite our paper as follows:

@article{zhao2024sapt,
  title={Sapt: A shared attention framework for parameter-efficient continual learning of large language models},
  author={Zhao, Weixiang and Wang, Shilong and Hu, Yulin and Zhao, Yanyan and Qin, Bing and Zhang, Xuanyu and Yang, Qing and Xu, Dongliang and Che, Wanxiang},
  journal={arXiv preprint arXiv:2401.08295},
  year={2024}
}

Credits

The code of this repository partly relies on O-LoRA and I would like to show my sincere gratitude to authors of it.

sapt's People

Contributors

circle-hit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.