Giter Site home page Giter Site logo

torchtune's Introduction

Unit Test Recipe Integration Test

torchtune

Introduction | Installation | Get Started | Documentation | Design Principles | Community Contributions | License

 

Introduction

torchtune is a PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs. We're excited to announce our alpha release!

torchtune provides:

  • Native-PyTorch implementations of popular LLMs using composable and modular building blocks
  • Easy-to-use and hackable training recipes for popular fine-tuning techniques (LoRA, QLoRA) - no trainers, no frameworks, just PyTorch!
  • YAML configs for easily configuring training, evaluation, quantization or inference recipes
  • Built-in support for many popular dataset formats and prompt templates to help you quickly get started with training

torchtune focuses on integrating with popular tools and libraries from the ecosystem. These are just a few examples, with more under development:

 

Models

torchtune currently supports the following models.

Model Sizes
Llama2 7B, 13B [models, configs]
Mistral 7B [model, configs]
Gemma 2B [model, configs]

We'll be adding a number of new models in the coming weeks, including support for 70B versions and MoEs.

 

Fine-tuning recipes

torchtune provides the following fine-tuning recipes.

Training Fine-tuning Method
Distributed Training [1 to 8 GPUs] Full [code, example], LoRA [code, example]
Single Device / Low Memory [1 GPU] Full [code, example], LoRA + QLoRA [code, example]
Single Device [1 GPU] DPO [code, example]

 

Memory efficiency is important to us. All of our recipes are tested on a variety of setups including commodity GPUs with 24GB of VRAM as well as beefier options found in data centers.

Single-GPU recipes expose a number of memory optimizations that aren't available in the distributed versions. These include support for low-precision optimizers from bitsandbytes and fusing optimizer step with backward to reduce memory footprint from the gradients (see example config). For memory-constrained setups, we recommend using the single-device configs as a starting point. For example, our default QLoRA config has a peak memory usage of ~9.3GB. Similarly LoRA on single device with batch_size=2 has a peak memory usage of ~17.1GB. Both of these are with dtype=bf16 and AdamW as the optimizer.

This table captures the minimum memory requirements for our different recipes using the associated configs.

Example HW Resources Finetuning Method Config Model Peak Memory per GPU
1 x RTX 4090 QLoRA qlora_finetune_single_device Llama-7B 9.29 GB
2 x RTX 4090 LoRA lora_finetune_distributed Llama-7B 20.95 GB
1 x RTX 4090 LoRA lora_finetune_single_device Llama-7B 17.18 GB
1 x RTX 4090 Full finetune full_finetune_single_device Llama-7B 14.97 GB
4 x RTX 4090 Full finetune full_finetune_distributed Llama-7B 22.9 GB
  • these are averaged over multiple runs, but there might be some variance based on the setup. We'll update this table regularly.

 


Installation

Step 1: Install PyTorch. torchtune is tested with the latest stable PyTorch release (2.2.2) as well as the preview nightly version.

Step 2: The latest stable version of torchtune is hosted on PyPI and can be downloaded with the following command:

pip install torchtune

To confirm that the package is installed correctly, you can run the following command:

tune --help

And should see the following output:

usage: tune [-h] {ls,cp,download,run,validate} ...

Welcome to the TorchTune CLI!

options:
  -h, --help            show this help message and exit

...

 


Get Started

To get started with fine-tuning your first LLM with torchtune, see our tutorial on fine-tuning Llama2 7B. Our end-to-end workflow tutorial will show you how to evaluate, quantize and run inference with this model. The rest of this section will provide a quick overview of these steps with Llama2.

 

Downloading a model

Follow the instructions on the official meta-llama repository to ensure you have access to the Llama2 model weights. Once you have confirmed access, you can run the following command to download the weights to your local machine. This will also download the tokenizer model and a responsible use guide.

tune download meta-llama/Llama-2-7b-hf \
--output-dir /tmp/Llama-2-7b-hf \
--hf-token <HF_TOKEN> \

Tip: Set your environment variable HF_TOKEN or pass in --hf-token to the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens

 

Running fine-tuning recipes

Llama2 7B + LoRA on single GPU:

tune run lora_finetune_single_device --config llama2/7B_lora_single_device

For distributed training, tune CLI integrates with torchrun. Llama2 7B + LoRA on two GPUs:

tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full

Tip: Make sure to place any torchrun commands before the recipe specification. Any CLI args after this will override the config and not impact distributed training.

 

Modify Configs

There are two ways in which you can modify configs:

Config Overrides

You can easily overwrite config properties from the command-line:

tune run lora_finetune_single_device \
--config llama2/7B_lora_single_device \
batch_size=8 \
enable_activation_checkpointing=True \
max_steps_per_epoch=128

Update a Local Copy

You can also copy the config to your local directory and modify the contents directly:

tune cp llama2/7B_full ./my_custom_config.yaml
Copied to ./7B_full.yaml

Then, you can run your custom recipe by directing the tune run command to your local files:

tune run full_finetune_distributed --config ./my_custom_config.yaml

 

Check out tune --help for all possible CLI commands and options. For more information on using and updating configs, take a look at our config deep-dive.

 

Design Principles

torchtune embodies PyTorch’s design philosophy [details], especially "usability over everything else".

Native PyTorch

torchtune is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: Hugging Face Datasets, EleutherAI Eval Harness), all of the core functionality is written in PyTorch.

Simplicity and Extensibility

torchtune is designed to be easy to understand, use and extend.

  • Composition over implementation inheritance - layers of inheritance for code re-use makes the code hard to read and extend
  • No training frameworks - explicitly outlining the training logic makes it easy to extend for custom use cases
  • Code duplication is preferred over unnecessary abstractions
  • Modular building blocks over monolithic components

Correctness

torchtune provides well-tested components with a high-bar on correctness. The library will never be the first to provide a feature, but available features will be thoroughly tested. We provide

  • Extensive unit-tests to ensure component-level numerical parity with reference implementations
  • Checkpoint-tests to ensure model-level numerical parity with reference implementations
  • Integration tests to ensure recipe-level performance parity with reference implementations on standard benchmarks

 

Community Contributions

We really value our community and the contributions made by our wonderful users. We'll use this section to call out some of these contributions! If you'd like to help out as well, please see the CONTRIBUTING guide.

  • @solitude-alive for adding the Gemma 2B model to torchtune, including recipe changes, numeric validations of the models and recipe correctness
  • @yechenzhi for adding DPO to torchtune, including the recipe and config along with correctness checks

 

Acknowledgements

The Llama2 code in this repository is inspired by the original Llama2 code.

We want to give a huge shout-out to EleutherAI, Hugging Face and Weights & Biases for being wonderful collaborators and for working with us on some of these integrations within torchtune.

We also want to acknowledge some awesome libraries and tools from the ecosystem:

  • gpt-fast for performant LLM inference techniques which we've adopted OOTB
  • llama recipes for spring-boarding the llama2 community
  • bitsandbytes for bringing several memory and performance based techniques to the PyTorch ecosystem
  • @winglian and axolotl for early feedback and brainstorming on torchtune's design and feature set.
  • lit-gpt for pushing the LLM fine-tuning community forward.

 

License

torchtune is released under the BSD 3 license. However you may have other legal obligations that govern your use of other content, such as the terms of service for third-party models.

torchtune's People

Contributors

rohan-varma avatar kartikayk avatar joecummings avatar ebsmothers avatar rdoublea avatar daniellepintz avatar slr722 avatar nicolashug avatar svekars avatar gokulavasan avatar pbontrager avatar msaroufim avatar hardikjshah avatar jerryzh168 avatar tcapelle avatar yechenzhi avatar brycebortree avatar kit1980 avatar seemethere avatar chauhang avatar hamidshojanazeri avatar matthewdzmura avatar iseeyuan avatar alband avatar solitude-alive avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.