moonkraken / rusty_llama Goto Github PK

View Code? Open in Web Editor NEW

379.0 379.0 73.0 6.15 MB

A simple ChatGPT clone in Rust on both the frontend and backend. Uses open source language models and TailwindCSS.

License: MIT License

TypeScript 13.22% CSS 0.94% Rust 84.82% SCSS 0.25% JavaScript 0.76%

rusty_llama's People

Contributors

Stargazers

Watchers

Forkers

pterameta yshing noc2spam willembasson tonywhite11 kurdin ctulu82 macedot hoangph271 jasonvdmeijden omofolarin ai-for hazadus dawnywu ca0abinary tinix kalebpace aneesh2usman derjaeger gustavoteixeirah ashrafullah luander lukasgeppert xvolks dragonde rustloverthecoder holg gesamaras amirmohammadkazemi charlesmims salmanamin0 ronsheely lwandsyj arthurkulchenko cajun-code 101t cim-labs sphuakpong rohankumardubey silicongreen skriems cybersholt bartekupartek mldrowan darava13 cwunnebu78652 joetondev engelberger sensoria-pro mu-gee hasankau zwpdbh jcosta33 waldmatias jondysinger nextthread credhat artisdom obentti jpx40 magikmw carcruz97 ethanc-ec vaniusrb tindang97 mlg4035 nathanejbrown nanpiyaporn

rusty_llama's Issues

Model not giving the result on input

I need help in this issue
Everything is integrated and installed but the model is not giving the results.

Run command:
cargo leptos watch

Logs at the terminal:
ggml_metal_init: allocating ggml_metal_init: using MPS ggml_metal_init: loading '(null)' ggml_metal_init: loaded kernel_add 0x7feee442ada0 ggml_metal_init: loaded kernel_add_row 0x7feee442c150 ggml_metal_init: loaded kernel_mul 0x7feee442d4f0 ggml_metal_init: loaded kernel_mul_row 0x7feee442e890 ggml_metal_init: loaded kernel_scale 0x7feee442fc30 ggml_metal_init: loaded kernel_silu 0x7feee4430fb0 ggml_metal_init: loaded kernel_relu 0x7feee4432330 ggml_metal_init: loaded kernel_gelu 0x7feee44336b0 ggml_metal_init: loaded kernel_soft_max 0x7feee4434a30 ggml_metal_init: loaded kernel_diag_mask_inf 0x7feee4435e10 ggml_metal_init: loaded kernel_get_rows_f16 0x7feee4437190 ggml_metal_init: loaded kernel_get_rows_q4_0 0x7feee4438510 ggml_metal_init: loaded kernel_get_rows_q4_1 0x7feee4439890 ggml_metal_init: loaded kernel_get_rows_q2_K 0x7feee443ac10 ggml_metal_init: loaded kernel_get_rows_q3_K 0x7feee443bfd0 ggml_metal_init: loaded kernel_get_rows_q4_K 0x7feee443d590 ggml_metal_init: loaded kernel_get_rows_q5_K 0x7feee443e910 ggml_metal_init: loaded kernel_get_rows_q6_K 0x7feee443fc90 ggml_metal_init: loaded kernel_rms_norm 0x0 ggml_metal_init: loaded kernel_norm 0x7feee4441320 ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x7feee4442460 ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x0 ggml_metal_init: loaded kernel_rope 0x7feee4443f20 ggml_metal_init: loaded kernel_alibi_f32 0x7feee4444e30 ggml_metal_init: loaded kernel_cpy_f32_f16 0x7feee44461b0 ggml_metal_init: loaded kernel_cpy_f32_f32 0x7feee4447550 ggml_metal_init: loaded kernel_cpy_f16_f16 0x7feee44488f0 ggml_metal_init: recommendedMaxWorkingSetSize = 1536.00 MB ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: maxTransferRate = built-in GPU ggml_metal_add_buffer: allocated 'scratch ' buffer, size = 1024.00 MB, (21723.30 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scratch ' buffer, size = 512.00 MB, (22235.30 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scratch ' buffer, size = 512.00 MB, (22747.30 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1024.08 MB, (23771.38 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1152.00 MB, offs = 0 ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1152.00 MB, offs = 1134227456 ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1152.00 MB, offs = 2268454912 ggml_metal_add_buffer: allocated 'wt ' buffer, size = 371.02 MB, offs = 3402682368, (27598.41 / 1536.00), warning: current allocated size is greater than the recommended max working set size

Screenshots are attached below:

Error compiling due to wasm-bindgen version

Hi,
first of all nice work. I was wondering if you could lend me a hand, since I am experiencing and issue when trying to compile using cargo leptos watch. The error is the following:

Finished dev [unoptimized + debuginfo] target(s) in 44.24s Cargo finished cargo build --package=rusty_llama --lib --target-dir=/home/albert/Desktop/llama-rust/target/front --target=wasm32-unknown-unknown --no-default-features --features=hydrate Front compiling WASM Error: at/home/albert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.17/src/compile/front.rs:47:30`

Caused by:
0: at /home/albert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.17/src/compile/front.rs:122:10
1:

   it looks like the Rust project used to create this wasm file was linked against
   version of wasm-bindgen that uses a different bindgen format than this binary:
   
     rust wasm file schema version: 0.2.89
        this binary schema version: 0.2.92
   
   Currently the bindgen format is unstable enough that these two schema versions
   must exactly match. You can accomplish this by either updating this binary or 
   the wasm-bindgen dependency in the Rust project.
   
   You should be able to update the wasm-bindgen dependency with:
   
       cargo update -p wasm-bindgen --precise 0.2.92
   
   don't forget to recompile your wasm file! Alternatively, you can update the 
   binary with:
   
       cargo install -f wasm-bindgen-cli --version 0.2.89
   
   if this warning fails to go away though and you're not sure what to do feel free
   to open an issue at https://github.com/rustwasm/wasm-bindgen/issues!`

I have tried with both options, using cargo update -p wasm-bindgen --precise 0.2.92 and cargo install -f wasm-bindgen-cli --version 0.2.89, but it doesn't seem to work. Any ideas?

Thank you very much

Incorrect conditional of user vs assistant when creating history.

Followed your example and it worked great, but noticed that the assistant treated me like its assistant xD.

https://github.com/Me163/rusty_llama/blob/6bd996fe3fc9ae5954e2bd2e083260e6c77f25e0/src/api.rs#L29

Safari

I did not managed to make it work with Safari.
I have this cryptic message in the console :
[Error] Unhandled Promise Rejection: LinkError: import function wbg:__wbg_fetch_336b6f0cb426b46e must be callable

Any ideas ?

Note: it works fine with both Brave and Firefox.

Only using 25% of the CPU

I was attempting to run it on a 32 core machine. It seems to be only using 25% of the CPU for some reason causing slower responses.

Here is an attached screenshot of what I mean:

Has anyone got any ideas?

Docker container running cuda support

I've created a Docker file for CUDA support. There are a few changes that would need to be made in the Cargo.toml also.

Here are the changes for the Cargo.toml:

# Change the features to cublas
llm = { git = "https://github.com/rustformers/llm.git", branch = "main", optional = true, features = ["cublas"] }

# change the site-addr to listen to all
site-addr = "0.0.0.0:3000"

Here's the docker file:

FROM nvidia/cuda:12.2.0-devel-ubuntu20.04

SHELL ["/bin/bash", "-ec"]

ARG DEBIAN_FRONTEND=noninteractive

WORKDIR /usr/src/app
RUN mkdir /usr/local/models

#RUN touch ~/.bashrc

RUN apt-get update
RUN apt-get install -y git build-essential curl libssl-dev pkg-config vim
RUN apt-get update
RUN curl --proto '=https' --tlsv1.3 https://sh.rustup.rs -sSf | bash -s -- -y
ENV PATH="/root/.cargo/bin:$PATH"

# install NodeJS
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash
RUN apt-get install -y nodejs

# CUDA GPU enabling cuBLAS
ENV PATH="$PATH:/usr/local/cuda/bin"
ENV CUDACXX=/usr/local/cuda/bin/nvcc

COPY . .

RUN rustup toolchain install nightly
RUN rustup target add wasm32-unknown-unknown
RUN cargo install trunk cargo-leptos

RUN source ~/.bashrc && npm install

RUN npx tailwindcss -i ./input.css -o ./style/output.css

EXPOSE 3000/tcp

CMD ["cargo", "leptos", "watch"]

I'm not all the familiar with Rust... So i'm just using the cargo leptos watch command to run this. I figure you can run the cargo leptos build then start the server that way... but maybe i can just let you guys take this and run with it.

Docker build:

docker build -t rusty_llama .

Docker run:

docker run -it --rm -v /usr/local/models/GGML_Models:/usr/local/models -e MODEL_PATH="/usr/local/models/nous-hermes-llama-2-7b.ggmlv3.q8_0.bin" -p 3000:3000 --runtime=nvidia --gpus all rusty_llama bash

This will drop you into bash so you can run it from the container with cargo leptos watch. I do see the tensor cores loading and my GPU VRAM loads up the model... However, it's not using it for queries. I see the CPU still being used.

Output from cargo leptos watch:

Loaded hyperparameters
ggml ctx size = 0.07 MB

ggml_init_cublas: found 3 CUDA devices:
  Device 0: Tesla T4
  Device 1: Tesla T4
  Device 2: Tesla T4
Loaded tensor 8/291
Loaded tensor 16/291
Loaded tensor 24/291
Loaded tensor 32/291
Loaded tensor 40/291
Loaded tensor 48/291
Loaded tensor 56/291
Loaded tensor 64/291
Loaded tensor 72/291
Loaded tensor 80/291
Loaded tensor 88/291
Loaded tensor 96/291
Loaded tensor 104/291
Loaded tensor 112/291
Loaded tensor 120/291
Loaded tensor 128/291
Loaded tensor 136/291
Loaded tensor 144/291
Loaded tensor 152/291
Loaded tensor 160/291
Loaded tensor 168/291
Loaded tensor 176/291
Loaded tensor 184/291
Loaded tensor 192/291
Loaded tensor 200/291
Loaded tensor 208/291
Loaded tensor 216/291
Loaded tensor 224/291
Loaded tensor 232/291
Loaded tensor 240/291
Loaded tensor 248/291
Loaded tensor 256/291
Loaded tensor 264/291
Loaded tensor 272/291
Loaded tensor 280/291
Loaded tensor 288/291
Loading of model complete
Model size = 6829.07 MB / num tensors = 291
[2023-08-08T14:33:00Z INFO  actix_server::builder] starting 48 workers
[2023-08-08T14:33:00Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime

nvidia-smi output after loading the model:

Tue Aug  8 14:38:22 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:3B:00.0 Off |                  Off |
| N/A   46C    P0              27W /  70W |    117MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:87:00.0 Off |                  Off |
| N/A   46C    P0              27W /  70W |    117MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla T4                       Off | 00000000:AF:00.0 Off |                  Off |
| N/A   46C    P0              26W /  70W |    117MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3100066      C   target/server/debug/leptos_start            112MiB |
|    1   N/A  N/A   3100066      C   target/server/debug/leptos_start            112MiB |
|    2   N/A  N/A   3100066      C   target/server/debug/leptos_start            112MiB |
+---------------------------------------------------------------------------------------+

I'm curious if there's an issue with the session.infer() ?

Could not read to string "style/output.css"

I'm not sure whether this is a leptos issue or rusty_llama's. I even installed tailwindcss using:

$ npm install -D tailwindcss

But here is what I get:

$ cargo leptos watch
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
       Cargo finished cargo build --package=rusty_llama --lib --target=wasm32-unknown-unknown --no-default-features --features=hydrate
       Front compiling WASM
Error: at `/home/mamadou/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.0/src/compile/style.rs:54:62`

Caused by:
    0: Could not read to string "style/output.css" at `/home/mamadou/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.0/src/ext/fs.rs:63:10`
    1: No such file or directory (os error 2)

moonkraken / rusty_llama Goto Github PK

rusty_llama's People

Contributors

Stargazers

Watchers

Forkers

rusty_llama's Issues

Model not giving the result on input

Error compiling due to wasm-bindgen version

Incorrect conditional of user vs assistant when creating history.

Safari

Only using 25% of the CPU

Docker container running cuda support

Could not read to string "style/output.css"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent