Giter Site home page Giter Site logo

michael-fangji / lit-llama Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lightning-ai/lit-llama

0.0 0.0 0.0 12.17 MB

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

License: Apache License 2.0

Python 100.00%

lit-llama's Introduction

Lit-LLaMA

⚡ Lit-LLaMA ️

cpu-tests Build Status license Discord

Lit-LLaMA and pineapple pizza

⚡ Lit-LLaMA ️

Independent implementation of LLaMA that is fully open source under the Apache 2.0 license.

This implementation builds on nanoGPT. Weights are distributed by Meta under a research-only license.

Why?

We believe that AI should be fully open source and part of the collective knowledge.

The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

This "taints" any other code and prevents integration with the rest of the ecosystem.

Lit-LLaMA solves that for good.

 

Design principles

Lit-LLaMA is:

  • Simple: Single-file implementation without boilerplate.
  • Correct: Numerically equivalent to the original model.
  • Optimized: Runs on consumer hardware or at scale.
  • Open-source: No strings attached.

Get involved!

Join our Discord to build high-performance, truly open-source models for the common benefit of the community.

 

Setup

Clone the repo

git clone https://github.com/Lightning-AI/lit-llama
cd lit-llama

install dependencies

pip install -r requirements.txt

You are all set! 🎉

 

Use the model

To generate text predictions, download the model weights following the instructions on the official LLaMA repository.

Once downloaded, you should have a folder like this:

checkpoints/llama
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── 13B
│   ...
├── tokenizer_checklist.chk
└── tokenizer.model

Convert the weights to the Lit-LLaMA format:

python scripts/convert_checkpoint.py \
    --output_dir checkpoints/lit-llama \
    --ckpt_dir checkpoints/llama \
    --tokenizer_path checkpoints/llama/tokenizer.model \
    --model_size 7B

Run inference:

python generate.py --prompt "Hello, my name is"

This will run the 7B model and require ~26 GB of GPU memory (A100 GPU).

Run Lit-LLaMA on consumer devices

For GPUs with less memory, enable quantization (--quantize llm.int8) or use bfloat16 (--dtype bfloat16). Quantization will take longer to load but require ~8GB of memory. bfloat16 is closer to the "full deal" and runs on ~10GB of GPU memory. This can run on any consumer GPU.

python generate.py --quantize llm.int8 --prompt "Hello, my name is"

See python generate.py --help for more options.

You can also use GPTQ-style int4 quantization, but this needs conversions of the weights first:

python quantize.py --checkpoint_path state_dict.pth --tokenizer_path tokenizer.model --output_path llama-7b-gptq.4bit.pt --dtype bfloat16  --quantize gptq.int4

With the generated quantized checkpoint generation works as usual with --quantize gptq.int4, bringing GPU usage to about ~5GB. As only the weights of the Linear layers are quantized, it is useful to use --dtype bfloat16 even with the quantization enabled.

 

Finetune the model

We provide a simple training scripts in finetune_lora.py and finetune_adapter.py that instruction-tunes a pretrained model on the Alpaca dataset using the techniques of LoRA and Adapter.

  1. Download the data and generate a instruction tuning dataset:

    python scripts/prepare_alpaca.py
  2. Run the finetuning script

    python finetune_lora.py

    or

    python finetune_adapter.py

It is expected that you have downloaded the pretrained weights as described above. The finetuning requires at least one GPU with ~24 GB memory (GTX 3090). Follow the instructions in the script to efficiently fit your GPU memory. Note: For some GPU models you might need to install PyTorch nightly (see issue #101).

Get involved!

We're in a quest towards fully open source AI.

Lit-LLaMA

Join us and start contributing, especially on the following areas:

Look at train.py for a starting point towards pre-training / fine-tuning using Lightning Fabric.

Don't forget to join our Discord!

Acknowledgements

License

Lit-LLaMA is released under the Apache 2.0 license.

lit-llama's People

Contributors

lantiga avatar carmocca avatar awaelchli avatar williamfalcon avatar t-vi avatar borda avatar dspinellis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.