Giter Site home page Giter Site logo

mistral.rs's Introduction

mistral.rs

Documentation

Mistral.rs is a LLM inference platform written in pure, safe Rust.

Upcoming features

  • Python bindings

Description

  • Lightweight OpenAI API compatible HTTP server.
  • Fast performance with per-sequence and catch-up KV cache management technique.
  • Continuous batching.
  • First X-LoRA inference platform with first class support.
  • 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit quantization for faster inference and optimized memory usage.
  • Apple silicon support with the Metal framework.

Supported models:

  • Mistral 7B
    • Normal
    • GGUF
    • X-LoRA
  • Gemma
    • Normal
    • X-LoRA
  • Llama
    • Normal
    • GGUF
    • GGML
    • X-LoRA

Library API

  • Rust multithreaded API for easy integration into any application: docs. To use, add mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git" } to the Cargo.toml.

HTTP Server Mistral.rs provides an OpenAI API compatible API server. It is accessible through the command line when one builds mistral.rs.

Usage

Build

To build mistral.rs, one should ensure they have Rust installed by following this link. The Huggingface token should be provided in ~/.cache/huggingface/token.

  • Using a script

    For an easy quickstart, the script below will download an setup Rust and then build mistral.rs to run on the CPU.

    sudo apt update -y
    sudo apt install libssl-dev -y
    sudo apt install pkg-config -y
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
    cargo build --release
  • Manual build

    If Rust is installed and the Huggingface token is set, then one may build mistral.rs by executing the build command. cargo build --release. The build process will output a binary misralrs at ./target/release/mistralrs.

Building for GPU, Metal or enabling other features

Rust uses a feature flag system during build to implement compile-time build options. As such, the following is a list of features which may be specified using the --features command.

  1. cuda
  2. metal
  3. flash-attn

Preparing the X-LoRA Ordering File

The X-LoRA ordering JSON file contains 2 parts. The first is the order of the adapters and the second, the layer ordering. The layer ordering has been automatically generated and should not be manipulated as it controls the application of scalings. However the order of adapter should be replaced by an array of strings of adapter names corresponding to the order the adapters were specified during training.

Run

To start a server serving Mistral on localhost:1234,

./mistralrs --port 1234 --log output.log mistral

Mistral.rs uses subcommands to control the model type. Please run ./mistralrs --help to see the subcommands.

To start an X-LoRA server with the default weights, run the following after modifying or copying the ordering file as described here.

./mistralrs --port 1234 x-lora-mistral -o ordering.json

Benchmarks

For the prompt "Tell me about the Rust type system in depth." and a maximum length of 256.

A6000 Mistral + CUDA + Flash Attention

  • 30.44 tok/s

A6000 Mistral GGUF + CUDA

  • 39.3 tok/s

mistral.rs's People

Contributors

ericlbuehler avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.