vicuna-tools / vicuna-installation-guide Goto Github PK
View Code? Open in Web Editor NEWThe "vicuna-installation-guide" provides step-by-step instructions for installing and configuring Vicuna 13 and 7B
The "vicuna-installation-guide" provides step-by-step instructions for installing and configuring Vicuna 13 and 7B
Can you please provide a good example on how to use the embedding with vicuna? Thank you!
Hello again ;)
How I can make it working with cuda?
I've tried to make it with LLAMA_CUBLAS
, but that didn't make any change
make -j LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX: g++ (Debian 10.2.1-6) 10.2.1 20210110
nvcc --forward-unknown-to-host-compiler -arch=native -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp ggml.o llama.o common.o ggml-cuda.o -o main -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp ggml.o llama.o ggml-cuda.o -o quantize -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize-stats/quantize-stats.cpp ggml.o llama.o ggml-cuda.o -o quantize-stats -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/perplexity/perplexity.cpp ggml.o llama.o common.o ggml-cuda.o -o perplexity -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/embedding/embedding.cpp ggml.o llama.o common.o ggml-cuda.o -o embedding -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include pocs/vdot/vdot.cpp ggml.o ggml-cuda.o -o vdot -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
==== Run ./main -h for help. ====
I'm using cuda 11.8
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Graphics... On | 00000000:01:00.0 Off | Off |
| 0% 38C P8 26W / 450W | 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Why does the script clone from github.com/fredi-python/llama.cpp instead of github.com/ggerganov/llama.cpp ?
Ignore
I know other Vicuna installs require you to get the llama weight in order to run vicuna, is that also the case for this installation?
Hello,
Using One-line install seems to by successful (except few warnings):
git clone https://github.com/fredi-python/llama.cpp.git && cd llama.cpp && make -j && cd models && wget -c https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-uncensored-q5_1.bin
Cloning into 'llama.cpp'...
remote: Enumerating objects: 2390, done.
remote: Counting objects: 100% (867/867), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 2390 (delta 815), reused 790 (delta 790), pack-reused 1523
Receiving objects: 100% (2390/2390), 2.16 MiB | 3.93 MiB/s, done.
Resolving deltas: 100% (1566/1566), done.
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC: cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX: g++ (Debian 10.2.1-6) 10.2.1 20210110
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o
llama.cpp: In function 'size_t llama_set_state_data(llama_context*, const uint8_t*)':
llama.cpp:2615:27: warning: cast from type 'const uint8_t*' {aka 'const unsigned char*'} to type 'void*' casts away qualifiers [-Wcast-qual]
2615 | kin3d->data = (void *) in;
| ^~~~~~~~~~~
llama.cpp:2619:27: warning: cast from type 'const uint8_t*' {aka 'const unsigned char*'} to type 'void*' casts away qualifiers [-Wcast-qual]
2619 | vin3d->data = (void *) in;
| ^~~~~~~~~~~
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native pocs/vdot/vdot.cpp ggml.o -o vdot
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/main/main.cpp ggml.o llama.o common.o -o main
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize/quantize.cpp ggml.o llama.o -o quantize
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize-stats/quantize-stats.cpp ggml.o llama.o -o quantize-stats
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding
==== Run ./main -h for help. ====
--2023-05-15 09:57:30-- https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-uncensored-q5_1.bin
Resolving huggingface.co (huggingface.co)... 108.138.51.20, 108.138.51.95, 108.138.51.49, ...
Connecting to huggingface.co (huggingface.co)|108.138.51.20|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/0a/36/0a36ee786df124a005175a3d339738ad57350a96ae625c2111bce6483acbe34a/6fc1294b722082631cd61b1bde2cfecd1533eb95b331dbbdacbebe4944ff974a?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ggml-vic13b-uncensored-q5_1.bin%3B+filename%3D%22ggml-vic13b-uncensored-q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1684397227&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzBhLzM2LzBhMzZlZTc4NmRmMTI0YTAwNTE3NWEzZDMzOTczOGFkNTczNTBhOTZhZTYyNWMyMTExYmNlNjQ4M2FjYmUzNGEvNmZjMTI5NGI3MjIwODI2MzFjZDYxYjFiZGUyY2ZlY2QxNTMzZWI5NWIzMzFkYmJkYWNiZWJlNDk0NGZmOTc0YT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODQzOTcyMjd9fX1dfQ__&Signature=tbSCc4T3qUsBlw-mQrtcKwBQL0cbfeZe8MH3aGUv4EgOfo0JZibFFetpyqKk88LsDRKNzStyM6epwjbiB11PwEE73JT6ajJnAkArMkNDOmTO4NP6poC1rHlM-XRz3WuSdi3nY0fdDYYYL1gHb%7EAPwILghy-z4-vWRSEPldUQGTuqCZqj2knjmVtIuHSk06fShBYKOWKM7nnzb0-ENQumj6garze%7Es7n0hQjX%7EBKTGAD-HI5mMy1I5rwfA5M6eQ9zYavGHKNj104LftBPBLjpvAamO6fGS1L6KQYiKG-t68AuDgBy8TVbdIfTYJbN52vnvcfaiz3E5QB8JrvMv5uETQ__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2023-05-15 09:57:31-- https://cdn-lfs.huggingface.co/repos/0a/36/0a36ee786df124a005175a3d339738ad57350a96ae625c2111bce6483acbe34a/6fc1294b722082631cd61b1bde2cfecd1533eb95b331dbbdacbebe4944ff974a?response-content-disposition=attachment%3B+filename*%3DUTF-8''ggml-vic13b-uncensored-q5_1.bin%3B+filename%3D%22ggml-vic13b-uncensored-q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1684397227&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzBhLzM2LzBhMzZlZTc4NmRmMTI0YTAwNTE3NWEzZDMzOTczOGFkNTczNTBhOTZhZTYyNWMyMTExYmNlNjQ4M2FjYmUzNGEvNmZjMTI5NGI3MjIwODI2MzFjZDYxYjFiZGUyY2ZlY2QxNTMzZWI5NWIzMzFkYmJkYWNiZWJlNDk0NGZmOTc0YT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODQzOTcyMjd9fX1dfQ__&Signature=tbSCc4T3qUsBlw-mQrtcKwBQL0cbfeZe8MH3aGUv4EgOfo0JZibFFetpyqKk88LsDRKNzStyM6epwjbiB11PwEE73JT6ajJnAkArMkNDOmTO4NP6poC1rHlM-XRz3WuSdi3nY0fdDYYYL1gHb~APwILghy-z4-vWRSEPldUQGTuqCZqj2knjmVtIuHSk06fShBYKOWKM7nnzb0-ENQumj6garze~s7n0hQjX~BKTGAD-HI5mMy1I5rwfA5M6eQ9zYavGHKNj104LftBPBLjpvAamO6fGS1L6KQYiKG-t68AuDgBy8TVbdIfTYJbN52vnvcfaiz3E5QB8JrvMv5uETQ__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.244.102.114, 18.244.102.76, 18.244.102.9, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.244.102.114|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9763701888 (9.1G) [application/octet-stream]
Saving to: 'ggml-vic13b-uncensored-q5_1.bin'
ggml-vic13b-uncensored-q5_1.bin 100%[=========================================================================================================>] 9.09G 2.99MB/s in 49m 46s
2023-05-15 10:47:17 (3.12 MB/s) - 'ggml-vic13b-uncensored-q5_1.bin' saved [9763701888/9763701888]
But when I try to run it, it throwing an error:
./main -m models/ggml-vic13b-uncensored-q5_1.bin -f 'prompts/chat-with-vicuna-v1.txt' -r 'User:' --temp 0.36
main: build = 523 (0737a47)
main: seed = 1684152947
llama.cpp: loading model from models/ggml-vic13b-uncensored-q5_1.bin
error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/ggml-vic13b-uncensored-q5_1.bin'
main: error: unable to load model```
can you merge this to enable gpu inference? ggerganov/llama.cpp#1642
I am usually using containers and pyenv and no conda environment.
It claims there are mismatch. Is runtime libraries enough, should I remove cuda from base install ?
How can I remedy this ? Thank you.
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ conda list |grep -E 'cuda|nvidia'
cuda-cccl 11.7.58 hc415cf5_0 nvidia/label/cuda-11.7.0
cuda-command-line-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-compiler 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-cudart 11.7.60 h9538e0e_0 nvidia/label/cuda-11.7.0
cuda-cudart-dev 11.7.60 h6a7c232_0 nvidia/label/cuda-11.7.0
cuda-cuobjdump 11.7.50 h28cc80a_0 nvidia/label/cuda-11.7.0
cuda-cupti 11.7.50 hb6f9eaf_0 nvidia/label/cuda-11.7.0
cuda-cuxxfilt 11.7.50 hb365495_0 nvidia/label/cuda-11.7.0
cuda-documentation 11.7.50 0 nvidia/label/cuda-11.7.0
cuda-driver-dev 11.7.60 0 nvidia/label/cuda-11.7.0
cuda-gdb 11.7.50 h4a0ac72_0 nvidia/label/cuda-11.7.0
cuda-libraries 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-libraries-dev 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-memcheck 11.7.50 hc446b2b_0 nvidia/label/cuda-11.7.0
cuda-nsight 11.7.50 0 nvidia/label/cuda-11.7.0
cuda-nsight-compute 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-nvcc 11.7.64 0 nvidia/label/cuda-11.7.0
cuda-nvdisasm 11.7.50 h5bd0695_0 nvidia/label/cuda-11.7.0
cuda-nvml-dev 11.7.50 h3af1343_0 nvidia/label/cuda-11.7.0
cuda-nvprof 11.7.50 h7a2404d_0 nvidia/label/cuda-11.7.0
cuda-nvprune 11.7.50 h7add7b4_0 nvidia/label/cuda-11.7.0
cuda-nvrtc 11.7.50 hd0285e0_0 nvidia/label/cuda-11.7.0
cuda-nvrtc-dev 11.7.50 heada363_0 nvidia/label/cuda-11.7.0
cuda-nvtx 11.7.50 h05b0816_0 nvidia/label/cuda-11.7.0
cuda-nvvp 11.7.50 hd2289d5_0 nvidia/label/cuda-11.7.0
cuda-runtime 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-sanitizer-api 11.7.50 hb424887_0 nvidia/label/cuda-11.7.0
cuda-toolkit 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-tools 11.7.0 0 nvidia/label/cuda-11.7.0
cuda-visual-tools 11.7.0 0 nvidia/label/cuda-11.7.0
gds-tools 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcublas 11.10.1.25 he442b6f_0 nvidia/label/cuda-11.7.0
libcublas-dev 11.10.1.25 h0c8ac2b_0 nvidia/label/cuda-11.7.0
libcufft 10.7.2.50 h80a1efe_0 nvidia/label/cuda-11.7.0
libcufft-dev 10.7.2.50 h59a5ac8_0 nvidia/label/cuda-11.7.0
libcufile 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcufile-dev 1.3.0.44 0 nvidia/label/cuda-11.7.0
libcurand 10.2.10.50 heec50f7_0 nvidia/label/cuda-11.7.0
libcurand-dev 10.2.10.50 hd49a9cd_0 nvidia/label/cuda-11.7.0
libcusolver 11.3.5.50 hcab339c_0 nvidia/label/cuda-11.7.0
libcusolver-dev 11.3.5.50 hc6eba6f_0 nvidia/label/cuda-11.7.0
libcusparse 11.7.3.50 h6aaafad_0 nvidia/label/cuda-11.7.0
libcusparse-dev 11.7.3.50 hc644b96_0 nvidia/label/cuda-11.7.0
libnpp 11.7.3.21 h3effbd9_0 nvidia/label/cuda-11.7.0
libnpp-dev 11.7.3.21 hb6476a9_0 nvidia/label/cuda-11.7.0
libnvjpeg 11.7.2.34 hfe236c7_0 nvidia/label/cuda-11.7.0
libnvjpeg-dev 11.7.2.34 h2e48410_0 nvidia/label/cuda-11.7.0
nsight-compute 2022.2.0.13 0 nvidia/label/cuda-11.7.0
pytorch 2.0.1 py3.11_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ sudo pacman -Q |grep -E "nvidia|cuda"
cuda 12.1.1-3
nvidia 530.41.03-15
nvidia-utils 530.41.03-1
opencl-nvidia 530.41.03-1
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ uname -a
Linux ggVicuna 6.3.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 30 May 2023 13:44:01 +0000 x86_64 GNU/Linux
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ whereis nvcc
nvcc: /opt/cuda/bin/nvcc /home/gediz/miniconda3/envs/vicuna-matata/bin/nvcc
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ nvidia-smi
Sun Jun 4 02:08:02 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03 Driver Version: 530.41.03 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
uname -a
Error:
(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ python setup_cuda.py install
running install
/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` directly.
Instead, use pypa/build, pypa/installer, pypa/build or
other standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!!
self.initialize_options()
/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` and ``easy_install``.
Instead, use pypa/build, pypa/installer, pypa/build or
other standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!!
self.initialize_options()
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
Traceback (most recent call last):
File "/home/gediz/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in <module>
setup(
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/install.py", line 80, in run
self.do_egg_install()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/install.py", line 129, in do_egg_install
self.run_command('bdist_egg')
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
self.run_command('build_ext')
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.
`wget https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML/resolve/main/vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
--2023-05-06 13:37:02-- https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML/resolve/main/vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
Resolving huggingface.co (huggingface.co)... 13.32.230.100, 13.32.230.67, 13.32.230.49, ...
Connecting to huggingface.co (huggingface.co)|13.32.230.100|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Username/Password Authentication Failed.
`
I made the thing in a build folder which worked but then in the instructions it says run some code that specifies a ./main
folder or location or something which does not exist - did I do something wrong and where is the folder/file/whatever that isn't there and how do I get it?
Hi,
Thanks for the simple guide, however, it's not working, anything mismatched?
./main -m models/ggml-vicuna-13B-1.1-q5_1.bin --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-vicuna-v1.txt
output
main: build = 732 (afd983c)
main: seed = 1696926741
llama.cpp: loading model from models/ggml-vicuna-13B-1.1-q5_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 9 (mostly Q5_1)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.00 MB
error loading model: llama.cpp: tensor 'norm.weight' is missing from model
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/ggml-vicuna-13B-1.1-q5_1.bin'
main: error: unable to load model
Still getting the "unable to load model error", I am on Linux Mint latest version using the one setp install script:
git clone https://github.com/fredi-python/llama.cpp.git && cd llama.cpp && make -j && cd models && wget -c https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-uncensored-q5_1.bin
After it has completed it's install I go to use Vicuna
Command:
~/llama.cpp$ ./main -m ggml-vic13b-uncensored-q5_1.bin -f 'prompts/chat-with-vicuna-v1.txt' -r 'User:' --temp 0.36
Output:
main: build = 632 (787aadf)
main: seed = 1687195202
error loading model: failed to open ggml-vic13b-uncensored-q5_1.bin: No such file or directory
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'ggml-vic13b-uncensored-q5_1.bin'
main: error: unable to load model
Thank you for the development, I hope this let me know if anything is needed.
I have set the server , but only few words output like blocked , and is a single progress which can't reponse fastly. it is only run and load model when the request is getting.
I get following error when running make -j
llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC: cc (Ubuntu 7.5.0-3ubuntu118.04) 7.5.018.04) 7.5.0
I CXX: g++ (Ubuntu 7.5.0-3ubuntu1
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml-quants-k.c -o ggml-quants-k.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o
ggml-quants-k.c: In function ‘ggml_vec_dot_q2_k_q8_k’:
ggml-quants-k.c:1121:36: warning: implicit declaration of function ‘_mm256_set_m128i’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
const __m256i scales[2] = {_mm256_set_m128i(l_scales, l_scales), _mm256_set_m128i(h_scales, h_scales)};
^~~~~~~~~~~~~~~~
_mm256_set_epi8
ggml-quants-k.c:1121:35: warning: missing braces around initializer [-Wmissing-braces]
const __m256i scales[2] = {_mm256_set_m128i(l_scales, l_scales), _mm256_set_m128i(h_scales, h_scales)};
^
{ }
ggml-quants-k.c: In function ‘ggml_vec_dot_q3_k_q8_k’:
ggml-quants-k.c:1361:35: warning: missing braces around initializer [-Wmissing-braces]
const __m256i scales[2] = {_mm256_set_m128i(l_scales, l_scales), _mm256_set_m128i(h_scales, h_scales)};
^
{ }
ggml-quants-k.c: In function ‘ggml_vec_dot_q4_k_q8_k’:
ggml-quants-k.c:1635:32: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
const __m256i scales = _mm256_set_m128i(sc128, sc128);
^~~~~~~~~~~~~~~~
ggml-quants-k.c: In function ‘ggml_vec_dot_q5_k_q8_k’:
ggml-quants-k.c:1865:32: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
const __m256i scales = _mm256_set_m128i(sc128, sc128);
^~~~~~~~~~~~~~~~
Makefile:238: recipe for target 'ggml-quants-k.o' failed
make: *** [ggml-quants-k.o] Error 1
make: *** Auf noch nicht beendete Prozesse wird gewartet …
ggml.c: In function ‘bytes_from_nibbles_32’:
ggml.c:551:27: warning: implicit declaration of function ‘_mm256_set_m128i’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
const __m256i bytes = _mm256_set_m128i(_mm_srli_epi16(tmp, 4), tmp);
^~~~~~~~~~~~~~~~
_mm256_set_epi8
ggml.c:551:27: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
Makefile:235: recipe for target 'ggml.o' failed
make: *** [ggml.o] Error 1
llama.cpp: In function ‘void llama_model_load_internal(const string&, llama_context&, int, int, ggml_type, bool, bool, bool, llama_progress_callback, void*)’:
llama.cpp:1127:19: warning: unused variable ‘n_gpu’ [-Wunused-variable]
const int n_gpu = std::min(n_gpu_layers, int(hparams.n_layer));
Getting the following error when running make -j
.
(pytorch_p39) [ec2-user@ip-10-76-218-85 llama.cpp]$ make -j
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC: cc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)
I CXX: g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o
ggml.c: In function ‘ggml_vec_dot_q4_2_q8_0’:
ggml.c:3250:40: warning: implicit declaration of function ‘_mm256_set_m128’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
const __m256 d = _mm256_mul_ps(_mm256_set_m128(d1, d0), _mm256_broadcast_ss(&y[i].d));
^~~~~~~~~~~~~~~
_mm256_set_epi8
ggml.c:3250:40: error: incompatible type for argument 1 of ‘_mm256_mul_ps’
In file included from /usr/lib/gcc/x86_64-redhat-linux/7/include/immintrin.h:41:0,
from ggml.c:186:
/usr/lib/gcc/x86_64-redhat-linux/7/include/avxintrin.h:317:1: note: expected ‘__m256 {aka __vector(8) float}’ but argument is of type ‘int’
_mm256_mul_ps (__m256 __A, __m256 __B)
^~~~~~~~~~~~~
ggml.c:3254:22: warning: implicit declaration of function ‘_mm256_set_m128i’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
__m256i bx = _mm256_set_m128i(bx1, bx0);
^~~~~~~~~~~~~~~~
_mm256_set_epi8
ggml.c:3254:22: error: incompatible types when initializing type ‘__m256i {aka __vector(4) long long int}’ using type ‘int’
make: *** [ggml.o] Error 1
make: *** Waiting for unfinished jobs....
llama.cpp: In function ‘size_t llama_set_state_data(llama_context*, const uint8_t*)’:
llama.cpp:2610:36: warning: cast from type ‘const uint8_t* {aka const unsigned char*}’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
kin3d->data = (void *) in;
^~
llama.cpp:2614:36: warning: cast from type ‘const uint8_t* {aka const unsigned char*}’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
vin3d->data = (void *) in;
^~
(pytorch_p39) [ec2-user@ip-10-76-218-85 llama.cpp]$
This is on a AWS p3.2xlarge
with DL-AMI installed and the pytorch_p39
conda environment activated (although not sure if it matters).
Not a C/C++ programmer, so not sure what to make of this error, but guessing its something to do with the environment. Any help would be greatly appreciated, thanks!
After running the latest instructions today to install it, I noticed when trying to run and use ggml-vic13b-q5_1.bin [ ./main -m models/ggml-vic13b-q5_1.bin -f 'prompts/chat-with-vicuna-v1.txt' -r 'User:' --temp 0.36 ] or ggml-vic13b-uncensored-q5_1.bin.
However, every time I stop the app and run it again and ask it the same question, I can get different + wrong answers. For example
User:What is the closest planet to earth?
Vicuna: The closest planet to Earth is Venus, which is about 0.38 AU (5.1 million km or 3.2 million miles) away from Earth on average.
is fine but if I close + start running the app again about 5-8 times, I'll get a different/wrong answer.
User:What is the closest planet to earth?
Vicuna: The closest planet to Earth is the Moon.
Is it normal for ...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.