Comments (17)
That is quite strange and unexpected.
Can you share your hardware, os and the steps you took so I can help debug?
Also can you try reconverting the model?
from fastllama.
I'm resetting my VM and then I'll do everything again, and see if it solves the issue, and if not, I'll send you all the steps I've done.
4 x Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz (1 Socket)
16.00 GiB Memory (3200 Mhz)
8.00 GiB SWAP
80 GiB SSD
from fastllama.
I reproduced everything since the beginning and logged all the steps I performed. Here is everything:
Ubuntu 20.04 Server / Python 3.8.10
apt-get -y install cmake
[Installing requirements]
apt install git
[Installing git]
git clone https://github.com/PotatoSpudowski/fastLLaMa
[Cloning repository]
cd fastLLaMa
chmod +x build.sh
apt-get install zsh
./build.sh
This command resulted in this error:
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c utils.cpp -o utils.o
make: g++: Command not found
make: *** [Makefile:227: utils.o] Error 127
Unable to build static library 'libllama'
Solved with apt-get install g++
./build.sh
-- Found PythonInterp: /usr/bin/python3.8 (found version "3.8.10")
CMake Error at build/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:133 (message):
Python config failure:
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: cannot import name 'sysconfig' from 'distutils'
(/usr/lib/python3.8/distutils/__init__.py)
Call Stack (most recent call first):
build/_deps/pybind11-src/tools/pybind11Tools.cmake:45 (find_package)
build/_deps/pybind11-src/tools/pybind11Common.cmake:201 (include)
build/_deps/pybind11-src/CMakeLists.txt:188 (include)
-- Configuring incomplete, errors occurred!
See also "/root/fastLLaMa/build/CMakeFiles/CMakeOutput.log".
Unable to build bridge.cpp and link the 'libllama'
Solved with apt-get install python3-dev
./build.sh
-> Build target fastLlama
mkdir models
[ Downloading LLaMA-13B ]
root@llama:~/fastLLaMa# ls ./models
13B tokenizer.model tokenizer_checklist.chk
pip install -r requirements.txt
python3 convert-pth-to-ggml.py models/13B/ 1 0
python3 quantize.py 13B
Here is in realtime what happened when I ran example.py
https://youtu.be/I8RwmOqn1Ic
from fastllama.
Here's in comparison, llama.cpp with chat.sh
https://youtu.be/Xo15ErpMEA4
from fastllama.
Ah I see!
We have not added support for avx512 yet!
Will get this done!
from fastllama.
Oh alright, let me know when it will be implemented ;)
from fastllama.
Hi,
Sorry I made a mistake. Your CPU supports only AVX2.
I tested the latest changes using my Intel 12400f which also supports AVX2. I tried the 7B model and it is working as expected now! We are refactoring and updating everything so soon it will be very similar to the lama.cpp repo in terms of speed and quality. Meanwhile I hope that this fixes your issue.
Here is the video
https://www.youtube.com/watch?v=OymL5Zzprd8
Closing this for now. Feel free to reopen if necessary :)
from fastllama.
Hello, thank you for that, but sadly it didn't seem to fix my issue.
I reset once again my VM and tried to install everything again with the latest commit of this repository, and re-installed everything, and this time I tried with the 7B model.
I can't tell if it's because it's a smaller model that what I've tried before, but it seems to be a little bit faster than before. However it is still slower than if I was using it with llama.cpp directly.
Here is a video, and in the description the different timestamps
https://youtu.be/ry8uvKAto3I
(I can't find where I can re-open this issue)
from fastllama.
Hi,
That is weird. I will have a look at this and figure out what is happening.
from fastllama.
Also did you run setup.py method of building as mentioned in the new readme? Can you share the logs as well?
from fastllama.
Yes, I went through the new steps that were in the README.md
There was no error when I ran setup.py, but I'll try to do it again and send you the logs
from fastllama.
Can you try the new update and let me know?
model = Model(
id=ModelKind.ALPACA_LORA_7B,
path=MODEL_PATH, #path to model
num_threads=16, #number of threads to use
n_ctx=512, #context size of model
last_n_size=16, #size of last n tokens (used for repetition penalty) (Optional)
n_batch=128,
)
If you feel like it is still slow, you can try increasing the n_batch
value
from fastllama.
Hi,
I am stuck at the build.py step, ive been struggling for minutes. The output I get is:
(env) root@llama:~/fastLLaMa# python3 setup.py -l python
Setup executing command: cmake ..
-- Found '/root/fastLLaMa/cmake/GlobalVars.cmake'
-- OpenMP found
-- Compiler flags used: -mf16c;-mavx;-mavx2;-mfma;-fno-rtti
-- Linking flags used:
-- Macros defined:
-- Compiler flags used: -mf16c;-mavx;-mavx2;-mfma;-fno-rtti
-- Linking flags used:
-- Macros defined:
-- Building interface folder 'python'
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /root/fastLLaMa/build
Setup executing command: make -j 4
[ 7%] Building C object CMakeFiles/ggml_library.dir/lib/ggml.c.o
cc1: warning: command line option ‘-fno-rtti’ is valid for C++/D/ObjC++ but not for C
[ 15%] Linking C static library libggml_library.a
[ 15%] Built target ggml_library
[ 23%] Building CXX object CMakeFiles/fast_llama_lib.dir/lib/llama.cpp.o
[ 30%] Building CXX object CMakeFiles/fast_llama_lib.dir/lib/bridge.cpp.o
In file included from /root/fastLLaMa/lib/llama.cpp:1:
/root/fastLLaMa/include/llama.hpp:57:76: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
57 | bool init(HyperParams const& params, Logger const& logger = Logger{});
| ^
In file included from /root/fastLLaMa/include/llama.hpp:11,
from /root/fastLLaMa/lib/llama.cpp:1:
/root/fastLLaMa/include/logger.hpp:50:9: note: ‘constexpr fastllama::Logger::Logger() noexcept’ is implicitly deleted because its exception-specification does not match the implicit exception-specification ‘noexcept (false)’
50 | Logger() noexcept = default;
| ^~~~~~
In file included from /root/fastLLaMa/lib/llama.cpp:1:
/root/fastLLaMa/include/llama.hpp:58:51: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
58 | void deinit(Logger const& logger = Logger{});
| ^
/root/fastLLaMa/include/llama.hpp:123:23: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
123 | Logger logger{};
| ^
In file included from /root/fastLLaMa/include/bridge.hpp:7,
from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/llama.hpp:57:76: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
57 | bool init(HyperParams const& params, Logger const& logger = Logger{});
| ^
In file included from /root/fastLLaMa/include/llama.hpp:11,
from /root/fastLLaMa/include/bridge.hpp:7,
from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/logger.hpp:50:9: note: ‘constexpr fastllama::Logger::Logger() noexcept’ is implicitly deleted because its exception-specification does not match the implicit exception-specification ‘noexcept (false)’
50 | Logger() noexcept = default;
| ^~~~~~
In file included from /root/fastLLaMa/include/bridge.hpp:7,
from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/llama.hpp:58:51: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
58 | void deinit(Logger const& logger = Logger{});
| ^
/root/fastLLaMa/include/llama.hpp:123:23: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
123 | Logger logger{};
| ^
In file included from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/bridge.hpp:31:27: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
31 | Logger logger{};
| ^
In file included from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/bridge.hpp: In static member function ‘static fastllama::FastLlama::Params fastllama::FastLlama::builder()’:
/root/fastLLaMa/include/bridge.hpp:74:52: error: could not convert ‘<brace-enclosed initializer list>()’ from ‘<brace-enclosed initializer list>’ to ‘fastllama::FastLlama::Params’
74 | static Params builder() noexcept { return {}; }
| ^
| |
| <brace-enclosed initializer list>
/root/fastLLaMa/lib/bridge.cpp: In member function ‘std::optional<fastllama::FastLlama> fastllama::FastLlama::Params::build(std::string_view, const string_view&)’:
/root/fastLLaMa/lib/bridge.cpp:146:34: error: could not convert ‘{<expression error>}’ from ‘<brace-enclosed initializer list>’ to ‘std::optional<fastllama::FastLlama>’
146 | return { std::move(temp) };
| ^
| |
| <brace-enclosed initializer list>
make[2]: *** [CMakeFiles/fast_llama_lib.dir/build.make:76: CMakeFiles/fast_llama_lib.dir/lib/llama.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/fast_llama_lib.dir/build.make:90: CMakeFiles/fast_llama_lib.dir/lib/bridge.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:165: CMakeFiles/fast_llama_lib.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
I could make it work with python3 setup.py
only, but it doesn't seems to fix the problem because while running the command to quantize the model, I get this output.
(env) root@llama:~/fastLLaMa# python3 quantize.py 7B
The "./src/quantize" script was not found in the current location.
If you want to use it from another location, set the --quantize-script-path argument from the command line.
(env) root@llama:~/fastLLaMa#
Edit
I updated my VM to Ubuntu 22.04, which also updated G++ to 11.3.0, and recloned the repository, and fixed the issue.
However, running the example.py file in examples/python seems to not work for me.
root@llama:~/fastLLaMa/examples/python# python3 example.py
Traceback (most recent call last):
File "/root/fastLLaMa/examples/python/example.py", line 15, in <module>
model = Model(
File "/root/fastLLaMa/examples/python/build/fastllama.py", line 95, in __init__
self.lib = ctypes.cdll.LoadLibrary(library_path)
File "/usr/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: build/interfaces/python/pyfastllama.so: cannot open shared object file: No such file or directory
Exception ignored in: <function Model.__del__ at 0x7f2eb0e0af80>
Traceback (most recent call last):
File "/root/fastLLaMa/examples/python/build/fastllama.py", line 223, in __del__
lib = self.lib
AttributeError: 'Model' object has no attribute 'lib'
root@llama:~/fastLLaMa/examples/python#
from fastllama.
We currently use relative paths, but we will make them into absolute paths. Run them from workspace root dir.
python ./examples/python/example.py
Otherwise, you can provide the absolute parent path to the pyfastllama.so
inside the build/interfaces/python
from fastllama.
Sorting this out soon!
from fastllama.
Hello, sorry, I forgot to reply.
I was able to make it work, and now everything works correctly. The answer is being generated even faster than the ingestion now! Thank you very much, I am closing this issue.
from fastllama.
@Showdown76py try increasing the n_batch
parameter for faster ingestion
model = Model(
id=ModelKind.ALPACA_LORA_7B,
path=MODEL_PATH, #path to model
num_threads=16, #number of threads to use
n_ctx=512, #context size of model
last_n_size=16, #size of last n tokens (used for repetition penalty) (Optional)
n_batch=128,
)
It will increase memory consumption tho!
from fastllama.
Related Issues (20)
- Cmake Error HOT 1
- Cannot build this HOT 5
- Pip support testing HOT 21
- from build.fastllama import Model, ModelKind ModuleNotFoundError: No module named 'build.fastllama' HOT 8
- convert-pth-to-ggml.py expects 2 parts for ALPACA-LORA-13B, but it has only one HOT 5
- Bad Magic error HOT 6
- When stop words are reached, they get ingested, but are not forwarded to streaming_fn. HOT 4
- Enabling custom logger makes it crash at ingestion. HOT 1
- TypeError: Model.generate() got an unexpected keyword argument 'stop_word' HOT 2
- Pip uninstall not removing the package HOT 2
- Designing the UI HOT 1
- Deciding the Schema for the protocol between webUI and webSocket Server HOT 2
- "No module named 'fastllama.api' " after pip installation HOT 10
- Implement the WebSocket Server
- Integrating + Testing webUI and WebSocket Server
- README.md is outdated in sections #running-llama and #running-alpaca-lora HOT 1
- how to load model in webui ? HOT 3
- Port llama.cpp openCL support to fastllama?
- Webui UX issue on mobile
- GGUF and/or LLama-3 support?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastllama.