vicuna-tools / vicuna-installation-guide Goto Github PK

View Code? Open in Web Editor NEW

287.0 7.0 33.0 21 KB

The "vicuna-installation-guide" provides step-by-step instructions for installing and configuring Vicuna 13 and 7B

large-language-models llamacpp llm vicuna vicuna-installation-guide

vicuna-installation-guide's Issues

Example of using embedding

Can you please provide a good example on how to use the embedding with vicuna? Thank you!

Run it on GPU

Hello again ;)

How I can make it working with cuda?
I've tried to make it with LLAMA_CUBLAS, but that didn't make any change

make -j LLAMA_CUBLAS=1
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS:  -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC:       cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX:      g++ (Debian 10.2.1-6) 10.2.1 20210110

nvcc --forward-unknown-to-host-compiler -arch=native -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp ggml.o llama.o common.o ggml-cuda.o -o main -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp ggml.o llama.o ggml-cuda.o -o quantize -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize-stats/quantize-stats.cpp ggml.o llama.o ggml-cuda.o -o quantize-stats -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/perplexity/perplexity.cpp ggml.o llama.o common.o ggml-cuda.o -o perplexity -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/embedding/embedding.cpp ggml.o llama.o common.o ggml-cuda.o -o embedding -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include pocs/vdot/vdot.cpp ggml.o ggml-cuda.o -o vdot -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib

====  Run ./main -h for help.  ====

I'm using cuda 11.8

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Graphics...  On   | 00000000:01:00.0 Off |                  Off |
|  0%   38C    P8    26W / 450W |      1MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

why github.com/fredi-python/llama.cpp

Why does the script clone from github.com/fredi-python/llama.cpp instead of github.com/ggerganov/llama.cpp ?

Doesnt open after pressing any key

Ignore

General Question

I know other Vicuna installs require you to get the llama weight in order to run vicuna, is that also the case for this installation?

Error loading model: is this really a GGML file?

Hello,

Using One-line install seems to by successful (except few warnings):

git clone https://github.com/fredi-python/llama.cpp.git && cd llama.cpp && make -j && cd models && wget -c https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-uncensored-q5_1.bin
Cloning into 'llama.cpp'...
remote: Enumerating objects: 2390, done.
remote: Counting objects: 100% (867/867), done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 2390 (delta 815), reused 790 (delta 790), pack-reused 1523
Receiving objects: 100% (2390/2390), 2.16 MiB | 3.93 MiB/s, done.
Resolving deltas: 100% (1566/1566), done.
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX:      g++ (Debian 10.2.1-6) 10.2.1 20210110

cc  -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native   -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o
llama.cpp: In function 'size_t llama_set_state_data(llama_context*, const uint8_t*)':
llama.cpp:2615:27: warning: cast from type 'const uint8_t*' {aka 'const unsigned char*'} to type 'void*' casts away qualifiers [-Wcast-qual]
 2615 |             kin3d->data = (void *) in;
      |                           ^~~~~~~~~~~
llama.cpp:2619:27: warning: cast from type 'const uint8_t*' {aka 'const unsigned char*'} to type 'void*' casts away qualifiers [-Wcast-qual]
 2619 |             vin3d->data = (void *) in;
      |                           ^~~~~~~~~~~
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native pocs/vdot/vdot.cpp ggml.o -o vdot 
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/main/main.cpp ggml.o llama.o common.o -o main 
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize/quantize.cpp ggml.o llama.o -o quantize 
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize-stats/quantize-stats.cpp ggml.o llama.o -o quantize-stats 
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity 
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding 

====  Run ./main -h for help.  ====

--2023-05-15 09:57:30--  https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-uncensored-q5_1.bin
Resolving huggingface.co (huggingface.co)... 108.138.51.20, 108.138.51.95, 108.138.51.49, ...
Connecting to huggingface.co (huggingface.co)|108.138.51.20|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/0a/36/0a36ee786df124a005175a3d339738ad57350a96ae625c2111bce6483acbe34a/6fc1294b722082631cd61b1bde2cfecd1533eb95b331dbbdacbebe4944ff974a?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ggml-vic13b-uncensored-q5_1.bin%3B+filename%3D%22ggml-vic13b-uncensored-q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1684397227&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzBhLzM2LzBhMzZlZTc4NmRmMTI0YTAwNTE3NWEzZDMzOTczOGFkNTczNTBhOTZhZTYyNWMyMTExYmNlNjQ4M2FjYmUzNGEvNmZjMTI5NGI3MjIwODI2MzFjZDYxYjFiZGUyY2ZlY2QxNTMzZWI5NWIzMzFkYmJkYWNiZWJlNDk0NGZmOTc0YT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODQzOTcyMjd9fX1dfQ__&Signature=tbSCc4T3qUsBlw-mQrtcKwBQL0cbfeZe8MH3aGUv4EgOfo0JZibFFetpyqKk88LsDRKNzStyM6epwjbiB11PwEE73JT6ajJnAkArMkNDOmTO4NP6poC1rHlM-XRz3WuSdi3nY0fdDYYYL1gHb%7EAPwILghy-z4-vWRSEPldUQGTuqCZqj2knjmVtIuHSk06fShBYKOWKM7nnzb0-ENQumj6garze%7Es7n0hQjX%7EBKTGAD-HI5mMy1I5rwfA5M6eQ9zYavGHKNj104LftBPBLjpvAamO6fGS1L6KQYiKG-t68AuDgBy8TVbdIfTYJbN52vnvcfaiz3E5QB8JrvMv5uETQ__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2023-05-15 09:57:31--  https://cdn-lfs.huggingface.co/repos/0a/36/0a36ee786df124a005175a3d339738ad57350a96ae625c2111bce6483acbe34a/6fc1294b722082631cd61b1bde2cfecd1533eb95b331dbbdacbebe4944ff974a?response-content-disposition=attachment%3B+filename*%3DUTF-8''ggml-vic13b-uncensored-q5_1.bin%3B+filename%3D%22ggml-vic13b-uncensored-q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1684397227&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzBhLzM2LzBhMzZlZTc4NmRmMTI0YTAwNTE3NWEzZDMzOTczOGFkNTczNTBhOTZhZTYyNWMyMTExYmNlNjQ4M2FjYmUzNGEvNmZjMTI5NGI3MjIwODI2MzFjZDYxYjFiZGUyY2ZlY2QxNTMzZWI5NWIzMzFkYmJkYWNiZWJlNDk0NGZmOTc0YT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODQzOTcyMjd9fX1dfQ__&Signature=tbSCc4T3qUsBlw-mQrtcKwBQL0cbfeZe8MH3aGUv4EgOfo0JZibFFetpyqKk88LsDRKNzStyM6epwjbiB11PwEE73JT6ajJnAkArMkNDOmTO4NP6poC1rHlM-XRz3WuSdi3nY0fdDYYYL1gHb~APwILghy-z4-vWRSEPldUQGTuqCZqj2knjmVtIuHSk06fShBYKOWKM7nnzb0-ENQumj6garze~s7n0hQjX~BKTGAD-HI5mMy1I5rwfA5M6eQ9zYavGHKNj104LftBPBLjpvAamO6fGS1L6KQYiKG-t68AuDgBy8TVbdIfTYJbN52vnvcfaiz3E5QB8JrvMv5uETQ__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.244.102.114, 18.244.102.76, 18.244.102.9, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.244.102.114|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9763701888 (9.1G) [application/octet-stream]
Saving to: 'ggml-vic13b-uncensored-q5_1.bin'

ggml-vic13b-uncensored-q5_1.bin                  100%[=========================================================================================================>]   9.09G  2.99MB/s    in 49m 46s 

2023-05-15 10:47:17 (3.12 MB/s) - 'ggml-vic13b-uncensored-q5_1.bin' saved [9763701888/9763701888]

But when I try to run it, it throwing an error:

./main -m models/ggml-vic13b-uncensored-q5_1.bin -f 'prompts/chat-with-vicuna-v1.txt' -r 'User:' --temp 0.36

main: build = 523 (0737a47)
main: seed  = 1684152947
llama.cpp: loading model from models/ggml-vic13b-uncensored-q5_1.bin
error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/ggml-vic13b-uncensored-q5_1.bin'
main: error: unable to load model```

gpu inference

can you merge this to enable gpu inference? ggerganov/llama.cpp#1642

I am usually using containers and pyenv and no conda environment.
It claims there are mismatch. Is runtime libraries enough, should I remove cuda from base install ?
How can I remedy this ? Thank you.

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ conda list |grep -E 'cuda|nvidia'
cuda-cccl                 11.7.58              hc415cf5_0    nvidia/label/cuda-11.7.0
cuda-command-line-tools   11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-compiler             11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-cudart               11.7.60              h9538e0e_0    nvidia/label/cuda-11.7.0
cuda-cudart-dev           11.7.60              h6a7c232_0    nvidia/label/cuda-11.7.0
cuda-cuobjdump            11.7.50              h28cc80a_0    nvidia/label/cuda-11.7.0
cuda-cupti                11.7.50              hb6f9eaf_0    nvidia/label/cuda-11.7.0
cuda-cuxxfilt             11.7.50              hb365495_0    nvidia/label/cuda-11.7.0
cuda-documentation        11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-driver-dev           11.7.60                       0    nvidia/label/cuda-11.7.0
cuda-gdb                  11.7.50              h4a0ac72_0    nvidia/label/cuda-11.7.0
cuda-libraries            11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-libraries-dev        11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-memcheck             11.7.50              hc446b2b_0    nvidia/label/cuda-11.7.0
cuda-nsight               11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nsight-compute       11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-nvcc                 11.7.64                       0    nvidia/label/cuda-11.7.0
cuda-nvdisasm             11.7.50              h5bd0695_0    nvidia/label/cuda-11.7.0
cuda-nvml-dev             11.7.50              h3af1343_0    nvidia/label/cuda-11.7.0
cuda-nvprof               11.7.50              h7a2404d_0    nvidia/label/cuda-11.7.0
cuda-nvprune              11.7.50              h7add7b4_0    nvidia/label/cuda-11.7.0
cuda-nvrtc                11.7.50              hd0285e0_0    nvidia/label/cuda-11.7.0
cuda-nvrtc-dev            11.7.50              heada363_0    nvidia/label/cuda-11.7.0
cuda-nvtx                 11.7.50              h05b0816_0    nvidia/label/cuda-11.7.0
cuda-nvvp                 11.7.50              hd2289d5_0    nvidia/label/cuda-11.7.0
cuda-runtime              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-sanitizer-api        11.7.50              hb424887_0    nvidia/label/cuda-11.7.0
cuda-toolkit              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-tools                11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-visual-tools         11.7.0                        0    nvidia/label/cuda-11.7.0
gds-tools                 1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcublas                 11.10.1.25           he442b6f_0    nvidia/label/cuda-11.7.0
libcublas-dev             11.10.1.25           h0c8ac2b_0    nvidia/label/cuda-11.7.0
libcufft                  10.7.2.50            h80a1efe_0    nvidia/label/cuda-11.7.0
libcufft-dev              10.7.2.50            h59a5ac8_0    nvidia/label/cuda-11.7.0
libcufile                 1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcufile-dev             1.3.0.44                      0    nvidia/label/cuda-11.7.0
libcurand                 10.2.10.50           heec50f7_0    nvidia/label/cuda-11.7.0
libcurand-dev             10.2.10.50           hd49a9cd_0    nvidia/label/cuda-11.7.0
libcusolver               11.3.5.50            hcab339c_0    nvidia/label/cuda-11.7.0
libcusolver-dev           11.3.5.50            hc6eba6f_0    nvidia/label/cuda-11.7.0
libcusparse               11.7.3.50            h6aaafad_0    nvidia/label/cuda-11.7.0
libcusparse-dev           11.7.3.50            hc644b96_0    nvidia/label/cuda-11.7.0
libnpp                    11.7.3.21            h3effbd9_0    nvidia/label/cuda-11.7.0
libnpp-dev                11.7.3.21            hb6476a9_0    nvidia/label/cuda-11.7.0
libnvjpeg                 11.7.2.34            hfe236c7_0    nvidia/label/cuda-11.7.0
libnvjpeg-dev             11.7.2.34            h2e48410_0    nvidia/label/cuda-11.7.0
nsight-compute            2022.2.0.13                   0    nvidia/label/cuda-11.7.0
pytorch                   2.0.1           py3.11_cuda11.7_cudnn8.5.0_0    pytorch
pytorch-cuda              11.7                 h778d358_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ sudo pacman -Q |grep -E "nvidia|cuda"
cuda 12.1.1-3
nvidia 530.41.03-15
nvidia-utils 530.41.03-1
opencl-nvidia 530.41.03-1

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ uname -a
Linux ggVicuna 6.3.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 30 May 2023 13:44:01 +0000 x86_64 GNU/Linux

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ whereis nvcc
nvcc: /opt/cuda/bin/nvcc /home/gediz/miniconda3/envs/vicuna-matata/bin/nvcc

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ nvidia-smi
Sun Jun  4 02:08:02 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+

uname -a

Error:

(vicuna-matata) [gediz@ggVicuna GPTQ-for-LLaMa]$ python setup_cuda.py install
running install
/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer, pypa/build or
        other standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer, pypa/build or
        other standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
Traceback (most recent call last):
  File "/home/gediz/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in <module>
    setup(
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/install.py", line 80, in run
    self.do_egg_install()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/install.py", line 129, in do_egg_install
    self.run_command('bdist_egg')
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
    _check_cuda_version(compiler_name, compiler_version)
  File "/home/gediz/miniconda3/envs/vicuna-matata/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 387, in _check_cuda_version
    raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError: 
The detected CUDA version (12.1) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

Hugging face wget command fails with Username/Password Authentication Failed.

`wget https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML/resolve/main/vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
--2023-05-06 13:37:02-- https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g-GGML/resolve/main/vicuna-13B-1.1-GPTQ-4bit-128g.GGML.bin
Resolving huggingface.co (huggingface.co)... 13.32.230.100, 13.32.230.67, 13.32.230.49, ...
Connecting to huggingface.co (huggingface.co)|13.32.230.100|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.
`

run issue

I made the thing in a build folder which worked but then in the instructions it says run some code that specifies a ./main folder or location or something which does not exist - did I do something wrong and where is the folder/file/whatever that isn't there and how do I get it?

unable to load model

Hi,

Thanks for the simple guide, however, it's not working, anything mismatched?

 ./main -m models/ggml-vicuna-13B-1.1-q5_1.bin --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-vicuna-v1.txt

output

main: build = 732 (afd983c)
main: seed  = 1696926741
llama.cpp: loading model from models/ggml-vicuna-13B-1.1-q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.00 MB
error loading model: llama.cpp: tensor 'norm.weight' is missing from model
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/ggml-vicuna-13B-1.1-q5_1.bin'
main: error: unable to load model

Unable to load model error still occurring

Still getting the "unable to load model error", I am on Linux Mint latest version using the one setp install script:

git clone https://github.com/fredi-python/llama.cpp.git && cd llama.cpp && make -j && cd models && wget -c https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vic13b-uncensored-q5_1.bin

After it has completed it's install I go to use Vicuna

Command:
~/llama.cpp$ ./main -m ggml-vic13b-uncensored-q5_1.bin -f 'prompts/chat-with-vicuna-v1.txt' -r 'User:' --temp 0.36

Output:
main: build = 632 (787aadf)
main: seed = 1687195202
error loading model: failed to open ggml-vic13b-uncensored-q5_1.bin: No such file or directory
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'ggml-vic13b-uncensored-q5_1.bin'
main: error: unable to load model

Thank you for the development, I hope this let me know if anything is needed.

how to generate a llama.cpp server with fastchat api

I have set the server , but only few words output like blocked , and is a single progress which can't reponse fastly. it is only run and load model when the request is getting.

make -j error

I get following error when running make -j

llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC: cc (Ubuntu 7.5.0-3ubuntu118.04) 7.5.0
I CXX: g++ (Ubuntu 7.5.0-3ubuntu118.04) 7.5.0

cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml-quants-k.c -o ggml-quants-k.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o
ggml-quants-k.c: In function ‘ggml_vec_dot_q2_k_q8_k’:
ggml-quants-k.c:1121:36: warning: implicit declaration of function ‘_mm256_set_m128i’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
const __m256i scales[2] = {_mm256_set_m128i(l_scales, l_scales), _mm256_set_m128i(h_scales, h_scales)};
^~~~~~~~~~~~~~~~
_mm256_set_epi8
ggml-quants-k.c:1121:35: warning: missing braces around initializer [-Wmissing-braces]
const __m256i scales[2] = {_mm256_set_m128i(l_scales, l_scales), _mm256_set_m128i(h_scales, h_scales)};
^
{ }
ggml-quants-k.c: In function ‘ggml_vec_dot_q3_k_q8_k’:
ggml-quants-k.c:1361:35: warning: missing braces around initializer [-Wmissing-braces]
const __m256i scales[2] = {_mm256_set_m128i(l_scales, l_scales), _mm256_set_m128i(h_scales, h_scales)};
^
{ }
ggml-quants-k.c: In function ‘ggml_vec_dot_q4_k_q8_k’:
ggml-quants-k.c:1635:32: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
const __m256i scales = _mm256_set_m128i(sc128, sc128);
^~~~~~~~~~~~~~~~
ggml-quants-k.c: In function ‘ggml_vec_dot_q5_k_q8_k’:
ggml-quants-k.c:1865:32: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
const __m256i scales = _mm256_set_m128i(sc128, sc128);
^~~~~~~~~~~~~~~~
Makefile:238: recipe for target 'ggml-quants-k.o' failed
make: *** [ggml-quants-k.o] Error 1
make: *** Auf noch nicht beendete Prozesse wird gewartet …
ggml.c: In function ‘bytes_from_nibbles_32’:
ggml.c:551:27: warning: implicit declaration of function ‘_mm256_set_m128i’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
const __m256i bytes = _mm256_set_m128i(_mm_srli_epi16(tmp, 4), tmp);
^~~~~~~~~~~~~~~~
_mm256_set_epi8
ggml.c:551:27: error: incompatible types when initializing type ‘__m256i {aka const __vector(4) long long int}’ using type ‘int’
Makefile:235: recipe for target 'ggml.o' failed
make: *** [ggml.o] Error 1
llama.cpp: In function ‘void llama_model_load_internal(const string&, llama_context&, int, int, ggml_type, bool, bool, bool, llama_progress_callback, void*)’:
llama.cpp:1127:19: warning: unused variable ‘n_gpu’ [-Wunused-variable]
const int n_gpu = std::min(n_gpu_layers, int(hparams.n_layer));

Installation failure: make -j gives error

Getting the following error when running make -j.

(pytorch_p39) [ec2-user@ip-10-76-218-85 llama.cpp]$ make -j
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:  
I CC:       cc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)
I CXX:      g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15)

cc  -I.              -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native   -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o
g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o
ggml.c: In function ‘ggml_vec_dot_q4_2_q8_0’:
ggml.c:3250:40: warning: implicit declaration of function ‘_mm256_set_m128’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
         const __m256 d = _mm256_mul_ps(_mm256_set_m128(d1, d0), _mm256_broadcast_ss(&y[i].d));
                                        ^~~~~~~~~~~~~~~
                                        _mm256_set_epi8
ggml.c:3250:40: error: incompatible type for argument 1 of ‘_mm256_mul_ps’
In file included from /usr/lib/gcc/x86_64-redhat-linux/7/include/immintrin.h:41:0,
                 from ggml.c:186:
/usr/lib/gcc/x86_64-redhat-linux/7/include/avxintrin.h:317:1: note: expected ‘__m256 {aka __vector(8) float}’ but argument is of type ‘int’
 _mm256_mul_ps (__m256 __A, __m256 __B)
 ^~~~~~~~~~~~~
ggml.c:3254:22: warning: implicit declaration of function ‘_mm256_set_m128i’; did you mean ‘_mm256_set_epi8’? [-Wimplicit-function-declaration]
         __m256i bx = _mm256_set_m128i(bx1, bx0);
                      ^~~~~~~~~~~~~~~~
                      _mm256_set_epi8
ggml.c:3254:22: error: incompatible types when initializing type ‘__m256i {aka __vector(4) long long int}’ using type ‘int’
make: *** [ggml.o] Error 1
make: *** Waiting for unfinished jobs....
llama.cpp: In function ‘size_t llama_set_state_data(llama_context*, const uint8_t*)’:
llama.cpp:2610:36: warning: cast from type ‘const uint8_t* {aka const unsigned char*}’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
             kin3d->data = (void *) in;
                                    ^~
llama.cpp:2614:36: warning: cast from type ‘const uint8_t* {aka const unsigned char*}’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
             vin3d->data = (void *) in;
                                    ^~
(pytorch_p39) [ec2-user@ip-10-76-218-85 llama.cpp]$

This is on a AWS p3.2xlarge with DL-AMI installed and the pytorch_p39 conda environment activated (although not sure if it matters).

Not a C/C++ programmer, so not sure what to make of this error, but guessing its something to do with the environment. Any help would be greatly appreciated, thanks!

Different and sometimes wrong answers with ggml-vic13b-q5_1.bin + ggml-vic13b-uncensored-q5_1.bin

After running the latest instructions today to install it, I noticed when trying to run and use ggml-vic13b-q5_1.bin [ ./main -m models/ggml-vic13b-q5_1.bin -f 'prompts/chat-with-vicuna-v1.txt' -r 'User:' --temp 0.36 ] or ggml-vic13b-uncensored-q5_1.bin.

However, every time I stop the app and run it again and ask it the same question, I can get different + wrong answers. For example

User:What is the closest planet to earth?
Vicuna: The closest planet to Earth is Venus, which is about 0.38 AU (5.1 million km or 3.2 million miles) away from Earth on average.

is fine but if I close + start running the app again about 5-8 times, I'll get a different/wrong answer.

User:What is the closest planet to earth?
Vicuna: The closest planet to Earth is the Moon.

Is it normal for ...

Get different answers asking the same question after a restart?
It gives wrong answers?

vicuna-tools / vicuna-installation-guide Goto Github PK

vicuna-installation-guide's Issues

Example of using embedding

Run it on GPU

why github.com/fredi-python/llama.cpp

Doesnt open after pressing any key

General Question

Error loading model: is this really a GGML file?

gpu inference

A little confused.

Hugging face wget command fails with Username/Password Authentication Failed.

run issue

unable to load model

Unable to load model error still occurring

how to generate a llama.cpp server with fastchat api

make -j error

Installation failure: make -j gives error

Different and sometimes wrong answers with ggml-vic13b-q5_1.bin + ggml-vic13b-uncensored-q5_1.bin

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent