Giter Site home page Giter Site logo

snnahs1 / llama-gpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from getumbrel/llama-gpt

0.0 0.0 0.0 2.14 MB

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device.

Home Page: https://apps.umbrel.com/app/llama-gpt

License: MIT License

Shell 4.09% JavaScript 0.97% TypeScript 93.68% CSS 0.30% Makefile 0.21% Dockerfile 0.75%

llama-gpt's Introduction

LlamaGPT

LlamaGPT

A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.
umbrel.com (we're hiring) »

Contents

  1. Demo
  2. Supported Models
  3. How to install
  4. OpenAI-compatible API
  5. Benchmarks
  6. Roadmap and contributing
  7. Acknowledgements

Demo

LlamaGPT.mp4

Supported models

Currently, LlamaGPT supports the following models. Support for running custom models is on the roadmap.

Model name Model size Model download size RAM required
Nous Hermes Llama 2 7B Chat (GGML q4_0) 7B 3.79GB 6.29GB
Nous Hermes Llama 2 13B Chat (GGML q4_0) 13B 7.32GB 9.82GB
Meta Llama 2 70B Chat (GGML q4_0) 70B 38.87GB 41.37GB

How to install

Install LlamaGPT on your umbrelOS home server

Running LlamaGPT on an umbrelOS home server is one click. Simply install it from the Umbrel App Store.

LlamaGPT on Umbrel App Store

Install LlamaGPT on M1/M2 Mac

Make sure your have Docker and Xcode installed.

Then, clone this repo and cd into it:

git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt

Run LlamaGPT with the following command:

./run-mac.sh --model 7b

To run 13B or 70B models, replace 7b with 13b or 70b respectively.

To stop LlamaGPT, do Ctrl + C in Terminal.

Install LlamaGPT anywhere else with Docker (CPU only)

You can run LlamaGPT on any x86 or arm64 system. Make sure you have Docker installed.

Then, clone this repo and cd into it:

git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt

To run the 7B model, run:

docker compose up

To run the 13B model, run:

docker compose -f docker-compose-13b.yml up

To run the 70B model, run:

docker compose -f docker-compose-70b.yml up

Note: On the first run, it may take a while for the model to be downloaded to the /models directory. You may see lots of output like for a few minutes, which is normal:

llama-gpt-llama-gpt-ui-1       | [INFO  wait] Host [llama-gpt-api-13b:8000] not yet available...

After the model has been downloaded and loaded, and the API server is running, you'll see an output like:

llama-gpt-llama-gpt-api-13b-1  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

You can then access LlamaGPT at http://localhost:3000.

To stop LlamaGPT, either do Ctrl + C or run:

docker compose down

Install LlamaGPT with Kubernetes

First, make sure you have a running Kubernetes cluster and kubectl is configured to interact with it.

Then, clone this repo and cd into it.

To deploy to Kubernetes first create a namespace:

kubectl create ns llama

Then apply the manifests under the /deploy/kubernetes directory with

kubectl apply -k deploy/kubernetes/. -n llama

Expose your service however you would normally do that.

OpenAI compatible API

Thanks to llama-cpp-python, a drop-in replacement for OpenAI API is available at http://localhost:3001. Open http://localhost:3001/docs to see the API documentation.

Benchmarks

We've tested LlamaGPT models on the following hardware with the default system prompt, and user prompt: "How does the universe expand?" at temperature 0 to guarantee deterministic results. Generation speed is averaged over the first 10 generations.

Feel free to add your own benchmarks to this table by opening a pull request.

Nous Hermes Llama 2 7B (GGML q4_0)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) with ./run-mac.sh 54 tokens/sec
M1 Max MacBook Pro (64GB RAM) with Docker 8.2 tokens/sec
Umbrel Home (16GB RAM) 2.7 tokens/sec
Raspberry Pi 4 (8GB RAM) 0.9 tokens/sec

Nous Hermes Llama 2 13B (GGML q4_0)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) with ./run-mac.sh 20 tokens/sec
M1 Max MacBook Pro (64GB RAM) with Docker 3.7 tokens/sec
Umbrel Home (16GB RAM) 1.5 tokens/sec

Meta Llama 2 70B Chat (GGML q4_0)

Device Generation speed
M1 Max MacBook Pro (64GB RAM) with ./run-mac.sh 4.8 tokens/sec
GCP e2-standard-16 vCPU (64 GB RAM) 1.75 tokens/sec
M2 Max MacBook Pro (96GB RAM) with Docker 0.69 tokens/sec

Roadmap and contributing

We're looking to add more features to LlamaGPT. You can see the roadmap here. The highest priorities are:

  • Moving the model out of the Docker image and into a separate volume.
  • Add Metal support for M1/M2 Macs.
  • Add CUDA support for NVIDIA GPUs (work in progress).
  • Add ability to load custom models.
  • Allow users to switch between models.
  • Making it easy to run custom models.

If you're a developer who'd like to help with any of these, please open an issue to discuss the best way to tackle the challenge. If you're looking to help but not sure where to begin, check out these issues that have specifically been marked as being friendly to new contributors.

Acknowledgements

A massive thank you to the following developers and teams for making LlamaGPT possible:


License

umbrel.com

llama-gpt's People

Contributors

mckaywrigley avatar mayankchhabra avatar thomasleveil avatar nauxliu avatar bcullman avatar syedmuzamilm avatar chanzhaoyu avatar dotneet avatar lukechilds avatar dasunnimantha avatar liby avatar alanpog avatar riande avatar nmfretz avatar matriq avatar spctechdev avatar ryanhex53 avatar oznav2 avatar itbm avatar hyena459 avatar borborborja avatar adiestel avatar anthonypuppo avatar aweshchoudhary avatar ch4r4f avatar ernestobarrera avatar jdban avatar srsholmes avatar shemarlindie avatar huuphongsan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.