[ICML'24] Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Considering the planned integration of FedKSeed into FederatedScope-LLM, the official implementation has been moved to FederatedScope-FedKSeed. We highly suggest to follow FederatedScope-FedKSeed to avoid missing the important updates, since the latest code will be released there as a priority.

This repository contains the official implementation for the work “Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes”. See more details in our paper.

Pre-trained large language models (LLMs) require fine-tuning to improve their responsiveness to natural language instructions. Federated learning (FL) offers a way to perform fine-tuning using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance heights possible with full-parameter tuning. However, the communication overhead associated with full-parameter tuning is prohibitively high for both servers and clients. This work introduces FedKSeed, a novel approach that employs zeroth-order optimization (ZOO) with a set of random seeds. It enables federated full-parameter tuning of billion-sized LLMs directly on devices. Our method significantly reduces transmission requirements between the server and clients to just a few scalar gradients and random seeds, amounting to only a few thousand bytes. Building on this, we develop a strategy to assess the significance of ZOO perturbations for FL, allowing for probability-differentiated seed sampling. This prioritizes perturbations that have a greater impact on model accuracy. Experiments across six scenarios with different LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in terms of both communication efficiency and new task generalization.

Project Structure

.
├── optimizers
│   ├── mezo_bias_optimizer.py  // implementation of FedKSeed-Pro
│   └── mezo_optimizer.py  // implementation of FedKSeed
├── utils_data
│   ├── default_tokens.py  // definitions of some special tokens
│   ├── llm_dataset.py  // utilities to load Dolly-15K
│   ├── load_data.py  // entrance to get dataloaders
│   ├── natural_instruction_loader.py  // utilities to load Natural Instructions
│   └── partition_data.py  // utilities to partition datasets in Dirichlet distribution
├── client.py
├── evaluations.py
├── main.py
└── server.py

Requirements

Please see requirements.txt.

Data Preparation

Natural Instructions To run experiments on Natural Instructions, you need to unzip the downloaded dataset in directory ./data.
Dolly-15K To run experiments on Dolly-15K, you need to download the corresponding dataset in directory ./data, with its name as databricks-dolly-15k.jsonl.

Running Examples

We provide some example scripts to conduct the experiments. The arguments can be adjusted according to the help information in their definitions.

FedKSeed on Natural Instructions

# On Natural Instructions, the number of clients `num_clients` does not require manual setting. 
# It will be automatically adjusted to the number of tasks in `splits/default/train_tasks.txt`.
python main.py --rounds 40 --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset instruct --lr 0.0000003 -K 1024 -m 0.05 --log

FedKSeed on Dolly-15K with $\alpha=0.5$

python main.py --rounds 60 --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset dolly --iid dir0.5 --num_clients 200 --lr 0.0000003 -K 1024 -m 0.05 --log

FedKSeed-Pro on Natural Instructions

# On Natural Instructions, the number of clients `num_clients` does not require manual setting. 
# It will be automatically adjusted to the number of tasks in `splits/default/train_tasks.txt`.
python main.py --rounds 40 --bias_sampling  --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset instruct --lr 0.0000003 -K 1024 -m 0.05 --log

FedKSeed-Pro on Dolly-15K with $\alpha=0.5$

python main.py --rounds 60 --bias_sampling  --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset dolly --iid dir0.5 --num_clients 200 --lr 0.0000003 -K 1024 -m 0.05 --log

License

This project adopts the Apache-2.0 License. If the implementations and/or our paper were useful to you, please consider citing this work:

@article{qin2023federated,
      title={Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes}, 
      author={Zhen Qin and Daoyuan Chen and Bingchen Qian and Bolin Ding and Yaliang Li and Shuiguang Deng},
      journal={arXiv preprint arXiv:2312.06353}
      year={2023}
}

zhenqincn / fedkseed Goto Github PK

fedkseed's Introduction

[ICML'24] Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Project Structure

Requirements

Data Preparation

Running Examples

License

fedkseed's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent