Giter Site home page Giter Site logo

llm_adapters's Introduction

LLM-Adapters

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

LLM-Adapters is an easy-to-use framework that integrates various adapters into LLMs and can execute adapter-based PEFT methods of LLMs for different tasks. LLM-Adapter is an extension of HuggingFace's PEFT library, many thanks for their amazing work! Please find our paper at this link: https://arxiv.org/abs/2304.01933.

The framework includes state-of-the-art open-access LLMs: LLaMa, OPT, BLOOM, and GPT-J, as well as widely used adapters such as Bottleneck adapters, Parallel adapters, and LoRA.

Supported Adapters:

  1. LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
  2. AdapterH: Parameter-Efficient Transfer Learning for NLP
  3. AdapterP: GMAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
  4. Parallel: TOWARDS A UNIFIED VIEW OF PARAMETER-EFFICIENT TRANSFER LEARNING
  5. Prefix Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation, P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
  6. P-Tuning: GPT Understands, Too
  7. Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning

Latest News πŸ”₯πŸ”₯

Setup

  1. Install dependencies
pip install -r requirements.txt
cd peft/
pip install -e .
  1. Set environment variables, or modify the files referencing BASE_MODEL:
# Files referencing `BASE_MODEL`
# export_hf_checkpoint.py
# export_state_dict_checkpoint.py

export BASE_MODEL=yahma/llama-7b-hf

Both finetune.py and generate.py use --base_model flag as shown further below.

  1. If bitsandbytes doesn't work, install it from source. Windows users can follow these instructions.

Training(finetune.py)

This file contains some code related to prompt construction and tokenization.In this file, specify different adapters and different sets of data, so that different models can be trained.

Example usage for multiple GPUs:

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=3192 finetune.py \
  --base_model 'yahma/llama-7b-hf' \
  --data_path 'math_data.json' \
  --output_dir './trained_models/llama-lora' \
  --batch_size 16 \
  --micro_batch_size 4 \
  --num_epochs 3 \
  --learning_rate 3e-4 \
  --cutoff_len 256 \
  --val_set_size 120 \
  --adapter_name lora

The math_data.json file contains preprocessed instruction data from the addsub, SingleEQ, MultiArith, AQuA, SVAMP and GSM8K dataset. yahma/llama-7b-hf is a base model, LLaMa-7B. Add lora adapter to this model.

Example usage for Single GPUs:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
  --base_model 'yahma/llama-7b-hf' \
  --data_path 'math_data.json' \
  --output_dir './trained_models/llama-lora' \
  --batch_size 16 \
  --micro_batch_size 4 \
  --num_epochs 3 \
  --learning_rate 3e-4 \
  --cutoff_len 256 \
  --val_set_size 120 \
  --adapter_name lora

Moreover, you can use --use_gradient_checkpointing to save more GPU memory, but it will increase the training time.

To use the AdapterH, just add the following arguments:

--adapter_name bottleneck # use the bottleneck adapter, refers to AdapterH in the result table

To use the AdapterP, just add the following arguments:

--adapter_name bottleneck 
--use_adapterp  # use the AdapterP, refers to AdapterP in the result table

To use parallel adapter, just add the following arguments:

--adapter_name bottleneck
--use_parallel_adapter

Note that, In order to facilitate INT8 training of large models with parallel adapters, we have adopted a technique whereby the parallel adapter layers are incorporated into multi-head attention layers and MLP layers, in parallel with Linear layers. It is different from Hu et al. (2021).

Inference (generate.py)

This file reads the foundation model from the Hugging Face model hub and the LoRA weights from './trained_models/llama-lora' , and runs a Gradio interface for inference on a specified input. Users should treat this as example code for the use of the model, and modify it as needed. Example usage:

CUDA_VISIBLE_DEVICES=0 torchrun generate.py \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights './trained_models/llama-lora'

Evaluation (evaluate.py)

To evaluate the performance of the finetuned model on the Arithmetic Reasoning tasks, you can use the following command:

CUDA_VISIBLE_DEVICES=0 python evaluate.py 
    --model LLaMA-7B \ #specify the base model
    --adapter LoRA \   #specify the adapter name ["LoRA", "AdapterH", "AdapterP", "Parallel", "Scaled_Parallel""]
    --dataset SVAMP \  #specify the test dataset
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights './trained_models/llama-lora'

Resource Consumption

There is a table of resouce needed for different adapters, which contains Trainable Parameters, GPU RAM Usage, and Fine-tuning Time on the Arithmetic Reasoning dataset math_data.json

Hyper-parameter setting: num_epochs=3, lora_r=8, lora_alpha=16, bottleneck_size=256

Models: LLaMA-13B, LLaMA-7B, BLOOM-6.7B, GPT-j-6B Dataset: 3.2K math word problems

Hardware: 2*3090 GPUs

Model Trainable Parameters GPU RAM Usage Fine-tuning Time
LLaMA-7B-LoRA 4.2M 18GB 1h
LLaMA-7B-AdapterH 200M 22GB 1h
LLaMA-7B-AdapterP 200M 22GB 1h
LLaMA-7B-Parallel 200M 22GB 1h

Finetune Result

There is a finetune result in different model with six dataset, which contains MultiArith, GSM8K, AddSub, AQuA, SingleEq, SVAMP

Model Params MultiArith GSM8K AddSub AQuA SingleEq SVAMP Average
GPT-3.5 - 83.8 56.4 85.3 38.9 88.1 69.9 70.4
LLaMA-13B-LoRA 6.5M 93.3 43.3 80.0 20.5 84.6 52.9 62.4
LLaMA-13B-AdapterH 314M 94.0 36.1 82.3 19.7 84.8 52.9 61.6
LLaMA-13B-AdapterP 104M 94.8 41.0 81.3 19.3 87.0 51.1 62.4
LLaMA-13B-Parallel 314M 95.0 43.8 84.6 20.9 88.0 53.5 64.3
LLaMA-7B-LoRA 4.2M 88.3 30.9 78.5 14.2 74.8 47.2 55.7
LLaMA-7B-AdapterH 200M 93.8 29.8 70.6 16.1 71.1 37.7 53.2
LLaMA-7B-AdapterP 66M 91.0 30.2 75.7 14.9 75.4 43.3 55.1
LLaMA-7B-Parallel 200M 93.7 33.3 80.5 16.5 81.7 46.5 58.7
BLOOM-7B-LoRA 4M 73.0 9.9 41.8 16.9 40.7 25.1 34.6
BLOOM-7B-AdapterH 125M 81.8 16.5 76.5 18.9 71.3 37.8 50.5
BLOOM-7B-AdapterP 62M 87.7 18.0 69.6 20.9 68.3 32.1 49.4
BLOOM-7B-Parallel 125M 78.2 15.7 65.4 20.5 64.2 35.1 46.5
GPT-j-6B-LoRA 3.7M 80.5 17.4 74.9 18.1 72.2 43.8 51.2
GPT-j-6B-AdapterH 117M 82.5 17.9 83.8 21.3 76.8 40.0 53.7
GPT-j-6B-AdapterP 58M 90.3 19.1 80.7 18.5 81.3 41.3 55.2
GPT-j-6B-Parallel 176M 77.8 17.5 77.2 20.5 74.8 39.8 51.3

Adapter support matrix

This metrix shows whether different models can use LoRA,AdapterH,AdapterP,Parallel and Scaled Parallel adapters.

Adapter LoRA AdapterH AdapterP Parallel Prefix Tuning P-Tuning Prompt Tuning
LLaMA βœ… βœ… βœ… βœ… βœ… βœ… βœ…
BLOOM βœ… βœ… βœ… βœ… βœ… βœ… βœ…
GPT-J βœ… βœ… βœ… βœ… βœ… βœ… βœ…
OPT βœ… βœ… βœ… βœ… βœ… βœ… βœ…
GPT-2 βœ… πŸ”§Developing πŸ”§Developing πŸ”§Developing βœ… βœ… βœ…
GPT-Neo βœ… βœ… βœ… βœ… βœ… βœ… βœ…
GPT-NeoX-20B βœ… πŸ”§Developing πŸ”§Developing πŸ”§Developing βœ… βœ… βœ…
ChatGLM βœ… βœ… βœ… βœ… βœ… βœ… βœ…

TODO List

  • Add AdapterH
  • Add AdapterP
  • Add Parallel Adapter
  • Support More LLMs
  • Support Multiple Adapter
  • Support Adapter Composition
  • Support Adapter Fusion

⭐ Star History

Star History Chart

Citing LLM-Adapter

If you use LLM-Adapters in your publication, please cite it by using the following BibTeX entry.

@article{hu2023llm,
  title={LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models},
  author={Hu, Zhiqiang and Lan, Yihuai and Wang, Lei and Xu, Wanyu and Lim, Ee-Peng and Lee, Roy Ka-Wei and Bing, Lidong and Poria, Soujanya},
  journal={arXiv preprint arXiv:2304.01933},
  year={2023}
}

Acknowledgement

This repo benefits from PEFT, Adapter-Transformer, Alpaca-lora. Thanks for their wonderful works. Additionally, we thank DONG Shan and dream.ai for the exceptional logo design, which has added immense value to our project.

llm_adapters's People

Contributors

hzq950419 avatar demoleiwang avatar he20010515 avatar zwq2018 avatar wanyu778 avatar lyh-yf avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.