Giter Site home page Giter Site logo

zhenqincn / fedkseed Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 0.0 34 KB

Implementation of "Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes" (https://arxiv.org/abs/2312.06353)

License: Apache License 2.0

Python 100.00%

fedkseed's Introduction

[ICML'24] Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

Considering the planned integration of FedKSeed into FederatedScope-LLM, the official implementation has been moved to FederatedScope-FedKSeed. We highly suggest to follow FederatedScope-FedKSeed to avoid missing the important updates, since the latest code will be released there as a priority.


This repository contains the official implementation for the work “Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes”. See more details in our paper.

Pre-trained large language models (LLMs) require fine-tuning to improve their responsiveness to natural language instructions. Federated learning (FL) offers a way to perform fine-tuning using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance heights possible with full-parameter tuning. However, the communication overhead associated with full-parameter tuning is prohibitively high for both servers and clients. This work introduces FedKSeed, a novel approach that employs zeroth-order optimization (ZOO) with a set of random seeds. It enables federated full-parameter tuning of billion-sized LLMs directly on devices. Our method significantly reduces transmission requirements between the server and clients to just a few scalar gradients and random seeds, amounting to only a few thousand bytes. Building on this, we develop a strategy to assess the significance of ZOO perturbations for FL, allowing for probability-differentiated seed sampling. This prioritizes perturbations that have a greater impact on model accuracy. Experiments across six scenarios with different LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in terms of both communication efficiency and new task generalization.

Project Structure

.
├── optimizers
│   ├── mezo_bias_optimizer.py  // implementation of FedKSeed-Pro
│   └── mezo_optimizer.py  // implementation of FedKSeed
├── utils_data
│   ├── default_tokens.py  // definitions of some special tokens
│   ├── llm_dataset.py  // utilities to load Dolly-15K
│   ├── load_data.py  // entrance to get dataloaders
│   ├── natural_instruction_loader.py  // utilities to load Natural Instructions
│   └── partition_data.py  // utilities to partition datasets in Dirichlet distribution
├── client.py
├── evaluations.py
├── main.py
└── server.py

Requirements

Please see requirements.txt.

Data Preparation

  1. Natural Instructions To run experiments on Natural Instructions, you need to unzip the downloaded dataset in directory ./data.

  2. Dolly-15K To run experiments on Dolly-15K, you need to download the corresponding dataset in directory ./data, with its name as databricks-dolly-15k.jsonl.

Running Examples

We provide some example scripts to conduct the experiments. The arguments can be adjusted according to the help information in their definitions.

  1. FedKSeed on Natural Instructions
# On Natural Instructions, the number of clients `num_clients` does not require manual setting. 
# It will be automatically adjusted to the number of tasks in `splits/default/train_tasks.txt`.
python main.py --rounds 40 --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset instruct --lr 0.0000003 -K 1024 -m 0.05 --log
  1. FedKSeed on Dolly-15K with $\alpha=0.5$
python main.py --rounds 60 --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset dolly --iid dir0.5 --num_clients 200 --lr 0.0000003 -K 1024 -m 0.05 --log
  1. FedKSeed-Pro on Natural Instructions
# On Natural Instructions, the number of clients `num_clients` does not require manual setting. 
# It will be automatically adjusted to the number of tasks in `splits/default/train_tasks.txt`.
python main.py --rounds 40 --bias_sampling  --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset instruct --lr 0.0000003 -K 1024 -m 0.05 --log
  1. FedKSeed-Pro on Dolly-15K with $\alpha=0.5$
python main.py --rounds 60 --bias_sampling  --model datajuicer/LLaMA-1B-dj-refine-150B --use_prompts --dataset dolly --iid dir0.5 --num_clients 200 --lr 0.0000003 -K 1024 -m 0.05 --log

License

This project adopts the Apache-2.0 License. If the implementations and/or our paper were useful to you, please consider citing this work:

@article{qin2023federated,
      title={Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes}, 
      author={Zhen Qin and Daoyuan Chen and Bingchen Qian and Bolin Ding and Yaliang Li and Shuiguang Deng},
      journal={arXiv preprint arXiv:2312.06353}
      year={2023}
}

fedkseed's People

Contributors

zhenqincn avatar

Stargazers

Wenqi Qiu avatar  avatar Shalan avatar Samuel Maddock avatar  avatar Daoyuan Chen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.