moonkraken / rusty_llama Goto Github PK
View Code? Open in Web Editor NEWA simple ChatGPT clone in Rust on both the frontend and backend. Uses open source language models and TailwindCSS.
License: MIT License
A simple ChatGPT clone in Rust on both the frontend and backend. Uses open source language models and TailwindCSS.
License: MIT License
I need help in this issue
Everything is integrated and installed but the model is not giving the results.
Run command:
cargo leptos watch
Logs at the terminal:
ggml_metal_init: allocating ggml_metal_init: using MPS ggml_metal_init: loading '(null)' ggml_metal_init: loaded kernel_add 0x7feee442ada0 ggml_metal_init: loaded kernel_add_row 0x7feee442c150 ggml_metal_init: loaded kernel_mul 0x7feee442d4f0 ggml_metal_init: loaded kernel_mul_row 0x7feee442e890 ggml_metal_init: loaded kernel_scale 0x7feee442fc30 ggml_metal_init: loaded kernel_silu 0x7feee4430fb0 ggml_metal_init: loaded kernel_relu 0x7feee4432330 ggml_metal_init: loaded kernel_gelu 0x7feee44336b0 ggml_metal_init: loaded kernel_soft_max 0x7feee4434a30 ggml_metal_init: loaded kernel_diag_mask_inf 0x7feee4435e10 ggml_metal_init: loaded kernel_get_rows_f16 0x7feee4437190 ggml_metal_init: loaded kernel_get_rows_q4_0 0x7feee4438510 ggml_metal_init: loaded kernel_get_rows_q4_1 0x7feee4439890 ggml_metal_init: loaded kernel_get_rows_q2_K 0x7feee443ac10 ggml_metal_init: loaded kernel_get_rows_q3_K 0x7feee443bfd0 ggml_metal_init: loaded kernel_get_rows_q4_K 0x7feee443d590 ggml_metal_init: loaded kernel_get_rows_q5_K 0x7feee443e910 ggml_metal_init: loaded kernel_get_rows_q6_K 0x7feee443fc90 ggml_metal_init: loaded kernel_rms_norm 0x0 ggml_metal_init: loaded kernel_norm 0x7feee4441320 ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x7feee4442460 ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x0 ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x0 ggml_metal_init: loaded kernel_rope 0x7feee4443f20 ggml_metal_init: loaded kernel_alibi_f32 0x7feee4444e30 ggml_metal_init: loaded kernel_cpy_f32_f16 0x7feee44461b0 ggml_metal_init: loaded kernel_cpy_f32_f32 0x7feee4447550 ggml_metal_init: loaded kernel_cpy_f16_f16 0x7feee44488f0 ggml_metal_init: recommendedMaxWorkingSetSize = 1536.00 MB ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: maxTransferRate = built-in GPU ggml_metal_add_buffer: allocated 'scratch ' buffer, size = 1024.00 MB, (21723.30 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scratch ' buffer, size = 512.00 MB, (22235.30 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'scratch ' buffer, size = 512.00 MB, (22747.30 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1024.08 MB, (23771.38 / 1536.00), warning: current allocated size is greater than the recommended max working set size ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1152.00 MB, offs = 0 ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1152.00 MB, offs = 1134227456 ggml_metal_add_buffer: allocated 'wt ' buffer, size = 1152.00 MB, offs = 2268454912 ggml_metal_add_buffer: allocated 'wt ' buffer, size = 371.02 MB, offs = 3402682368, (27598.41 / 1536.00), warning: current allocated size is greater than the recommended max working set size
Hi,
first of all nice work. I was wondering if you could lend me a hand, since I am experiencing and issue when trying to compile using cargo leptos watch. The error is the following:
Finished dev [unoptimized + debuginfo] target(s) in 44.24s Cargo finished cargo build --package=rusty_llama --lib --target-dir=/home/albert/Desktop/llama-rust/target/front --target=wasm32-unknown-unknown --no-default-features --features=hydrate Front compiling WASM Error: at
/home/albert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.17/src/compile/front.rs:47:30`
Caused by:
0: at /home/albert/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.17/src/compile/front.rs:122:10
1:
it looks like the Rust project used to create this wasm file was linked against
version of wasm-bindgen that uses a different bindgen format than this binary:
rust wasm file schema version: 0.2.89
this binary schema version: 0.2.92
Currently the bindgen format is unstable enough that these two schema versions
must exactly match. You can accomplish this by either updating this binary or
the wasm-bindgen dependency in the Rust project.
You should be able to update the wasm-bindgen dependency with:
cargo update -p wasm-bindgen --precise 0.2.92
don't forget to recompile your wasm file! Alternatively, you can update the
binary with:
cargo install -f wasm-bindgen-cli --version 0.2.89
if this warning fails to go away though and you're not sure what to do feel free
to open an issue at https://github.com/rustwasm/wasm-bindgen/issues!`
I have tried with both options, using cargo update -p wasm-bindgen --precise 0.2.92
and cargo install -f wasm-bindgen-cli --version 0.2.89
, but it doesn't seem to work. Any ideas?
Thank you very much
Followed your example and it worked great, but noticed that the assistant treated me like its assistant xD.
https://github.com/Me163/rusty_llama/blob/6bd996fe3fc9ae5954e2bd2e083260e6c77f25e0/src/api.rs#L29
I did not managed to make it work with Safari.
I have this cryptic message in the console :
[Error] Unhandled Promise Rejection: LinkError: import function wbg:__wbg_fetch_336b6f0cb426b46e must be callable
Any ideas ?
Note: it works fine with both Brave and Firefox.
I've created a Docker file for CUDA support. There are a few changes that would need to be made in the Cargo.toml also.
Here are the changes for the Cargo.toml:
# Change the features to cublas
llm = { git = "https://github.com/rustformers/llm.git", branch = "main", optional = true, features = ["cublas"] }
# change the site-addr to listen to all
site-addr = "0.0.0.0:3000"
Here's the docker file:
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04
SHELL ["/bin/bash", "-ec"]
ARG DEBIAN_FRONTEND=noninteractive
WORKDIR /usr/src/app
RUN mkdir /usr/local/models
#RUN touch ~/.bashrc
RUN apt-get update
RUN apt-get install -y git build-essential curl libssl-dev pkg-config vim
RUN apt-get update
RUN curl --proto '=https' --tlsv1.3 https://sh.rustup.rs -sSf | bash -s -- -y
ENV PATH="/root/.cargo/bin:$PATH"
# install NodeJS
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash
RUN apt-get install -y nodejs
# CUDA GPU enabling cuBLAS
ENV PATH="$PATH:/usr/local/cuda/bin"
ENV CUDACXX=/usr/local/cuda/bin/nvcc
COPY . .
RUN rustup toolchain install nightly
RUN rustup target add wasm32-unknown-unknown
RUN cargo install trunk cargo-leptos
RUN source ~/.bashrc && npm install
RUN npx tailwindcss -i ./input.css -o ./style/output.css
EXPOSE 3000/tcp
CMD ["cargo", "leptos", "watch"]
I'm not all the familiar with Rust... So i'm just using the cargo leptos watch
command to run this. I figure you can run the cargo leptos build
then start the server that way... but maybe i can just let you guys take this and run with it.
Docker build:
docker build -t rusty_llama .
Docker run:
docker run -it --rm -v /usr/local/models/GGML_Models:/usr/local/models -e MODEL_PATH="/usr/local/models/nous-hermes-llama-2-7b.ggmlv3.q8_0.bin" -p 3000:3000 --runtime=nvidia --gpus all rusty_llama bash
This will drop you into bash
so you can run it from the container with cargo leptos watch
. I do see the tensor cores loading and my GPU VRAM loads up the model... However, it's not using it for queries. I see the CPU still being used.
Output from cargo leptos watch
:
Loaded hyperparameters
ggml ctx size = 0.07 MB
ggml_init_cublas: found 3 CUDA devices:
Device 0: Tesla T4
Device 1: Tesla T4
Device 2: Tesla T4
Loaded tensor 8/291
Loaded tensor 16/291
Loaded tensor 24/291
Loaded tensor 32/291
Loaded tensor 40/291
Loaded tensor 48/291
Loaded tensor 56/291
Loaded tensor 64/291
Loaded tensor 72/291
Loaded tensor 80/291
Loaded tensor 88/291
Loaded tensor 96/291
Loaded tensor 104/291
Loaded tensor 112/291
Loaded tensor 120/291
Loaded tensor 128/291
Loaded tensor 136/291
Loaded tensor 144/291
Loaded tensor 152/291
Loaded tensor 160/291
Loaded tensor 168/291
Loaded tensor 176/291
Loaded tensor 184/291
Loaded tensor 192/291
Loaded tensor 200/291
Loaded tensor 208/291
Loaded tensor 216/291
Loaded tensor 224/291
Loaded tensor 232/291
Loaded tensor 240/291
Loaded tensor 248/291
Loaded tensor 256/291
Loaded tensor 264/291
Loaded tensor 272/291
Loaded tensor 280/291
Loaded tensor 288/291
Loading of model complete
Model size = 6829.07 MB / num tensors = 291
[2023-08-08T14:33:00Z INFO actix_server::builder] starting 48 workers
[2023-08-08T14:33:00Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
nvidia-smi output after loading the model:
Tue Aug 8 14:38:22 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 Off | 00000000:3B:00.0 Off | Off |
| N/A 46C P0 27W / 70W | 117MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:87:00.0 Off | Off |
| N/A 46C P0 27W / 70W | 117MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla T4 Off | 00000000:AF:00.0 Off | Off |
| N/A 46C P0 26W / 70W | 117MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3100066 C target/server/debug/leptos_start 112MiB |
| 1 N/A N/A 3100066 C target/server/debug/leptos_start 112MiB |
| 2 N/A N/A 3100066 C target/server/debug/leptos_start 112MiB |
+---------------------------------------------------------------------------------------+
I'm curious if there's an issue with the session.infer() ?
I'm not sure whether this is a leptos issue or rusty_llama's. I even installed tailwindcss using:
$ npm install -D tailwindcss
But here is what I get:
$ cargo leptos watch
Finished dev [unoptimized + debuginfo] target(s) in 0.07s
Cargo finished cargo build --package=rusty_llama --lib --target=wasm32-unknown-unknown --no-default-features --features=hydrate
Front compiling WASM
Error: at `/home/mamadou/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.0/src/compile/style.rs:54:62`
Caused by:
0: Could not read to string "style/output.css" at `/home/mamadou/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-leptos-0.2.0/src/ext/fs.rs:63:10`
1: No such file or directory (os error 2)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.