Hello, I'm using the example of fastLLaMa with the 13B model but for some reasons, eve

Here's in comparison, llama.cpp with chat.sh <a href="https://youtu.be/Xo15ErpMEA4

AVX2 performance issue about fastllama HOT 17 CLOSED

Showdown76py commented on May 25, 2024

AVX2 performance issue

from fastllama.

Comments (17)

PotatoSpudowski commented on May 25, 2024

That is quite strange and unexpected.
Can you share your hardware, os and the steps you took so I can help debug?

Also can you try reconverting the model?

from fastllama.

Showdown76py commented on May 25, 2024

I'm resetting my VM and then I'll do everything again, and see if it solves the issue, and if not, I'll send you all the steps I've done.
4 x Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz (1 Socket)
16.00 GiB Memory (3200 Mhz)
8.00 GiB SWAP
80 GiB SSD

from fastllama.

Showdown76py commented on May 25, 2024

I reproduced everything since the beginning and logged all the steps I performed. Here is everything:

Ubuntu 20.04 Server / Python 3.8.10
apt-get -y install cmake [Installing requirements]
apt install git [Installing git]

git clone https://github.com/PotatoSpudowski/fastLLaMa [Cloning repository]
cd fastLLaMa
chmod +x build.sh
apt-get install zsh
./build.sh

This command resulted in this error:

g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c utils.cpp -o utils.o
make: g++: Command not found
make: *** [Makefile:227: utils.o] Error 127
Unable to build static library 'libllama'

Solved with apt-get install g++
./build.sh

-- Found PythonInterp: /usr/bin/python3.8 (found version "3.8.10") 
CMake Error at build/_deps/pybind11-src/tools/FindPythonLibsNew.cmake:133 (message):
  Python config failure:

  Traceback (most recent call last):

    File "<string>", line 1, in <module>

  ImportError: cannot import name 'sysconfig' from 'distutils'
  (/usr/lib/python3.8/distutils/__init__.py)

Call Stack (most recent call first):
  build/_deps/pybind11-src/tools/pybind11Tools.cmake:45 (find_package)
  build/_deps/pybind11-src/tools/pybind11Common.cmake:201 (include)
  build/_deps/pybind11-src/CMakeLists.txt:188 (include)


-- Configuring incomplete, errors occurred!
See also "/root/fastLLaMa/build/CMakeFiles/CMakeOutput.log".
Unable to build bridge.cpp and link the 'libllama'

Solved with apt-get install python3-dev
./build.sh -> Build target fastLlama

mkdir models
[ Downloading LLaMA-13B ]

root@llama:~/fastLLaMa# ls ./models
13B  tokenizer.model  tokenizer_checklist.chk

pip install -r requirements.txt

python3 convert-pth-to-ggml.py models/13B/ 1 0
python3 quantize.py 13B

Here is in realtime what happened when I ran example.py
https://youtu.be/I8RwmOqn1Ic

from fastllama.

Showdown76py commented on May 25, 2024

Here's in comparison, llama.cpp with chat.sh
https://youtu.be/Xo15ErpMEA4

from fastllama.

PotatoSpudowski commented on May 25, 2024

Ah I see!
We have not added support for avx512 yet!

Will get this done!

from fastllama.

Showdown76py commented on May 25, 2024

Oh alright, let me know when it will be implemented ;)

from fastllama.

PotatoSpudowski commented on May 25, 2024

Hi,

Sorry I made a mistake. Your CPU supports only AVX2.

I tested the latest changes using my Intel 12400f which also supports AVX2. I tried the 7B model and it is working as expected now! We are refactoring and updating everything so soon it will be very similar to the lama.cpp repo in terms of speed and quality. Meanwhile I hope that this fixes your issue.

Here is the video
https://www.youtube.com/watch?v=OymL5Zzprd8

Closing this for now. Feel free to reopen if necessary :)

from fastllama.

Showdown76py commented on May 25, 2024

Hello, thank you for that, but sadly it didn't seem to fix my issue.
I reset once again my VM and tried to install everything again with the latest commit of this repository, and re-installed everything, and this time I tried with the 7B model.

I can't tell if it's because it's a smaller model that what I've tried before, but it seems to be a little bit faster than before. However it is still slower than if I was using it with llama.cpp directly.

Here is a video, and in the description the different timestamps

https://youtu.be/ry8uvKAto3I
(I can't find where I can re-open this issue)

from fastllama.

PotatoSpudowski commented on May 25, 2024

Hi,

That is weird. I will have a look at this and figure out what is happening.

from fastllama.

PotatoSpudowski commented on May 25, 2024

Also did you run setup.py method of building as mentioned in the new readme? Can you share the logs as well?

from fastllama.

Showdown76py commented on May 25, 2024

Yes, I went through the new steps that were in the README.md
There was no error when I ran setup.py, but I'll try to do it again and send you the logs

from fastllama.

PotatoSpudowski commented on May 25, 2024

Hi @Showdown76py

Can you try the new update and let me know?

model = Model(
        id=ModelKind.ALPACA_LORA_7B,
        path=MODEL_PATH, #path to model
        num_threads=16, #number of threads to use
        n_ctx=512, #context size of model
        last_n_size=16, #size of last n tokens (used for repetition penalty) (Optional)
        n_batch=128,
    )

If you feel like it is still slow, you can try increasing the n_batch value

from fastllama.

Showdown76py commented on May 25, 2024

Hi,
I am stuck at the build.py step, ive been struggling for minutes. The output I get is:
(env) root@llama:~/fastLLaMa# python3 setup.py -l python

Setup executing command: cmake ..
-- Found '/root/fastLLaMa/cmake/GlobalVars.cmake'
-- OpenMP found
-- Compiler flags used: -mf16c;-mavx;-mavx2;-mfma;-fno-rtti
-- Linking flags used: 
-- Macros defined: 
-- Compiler flags used: -mf16c;-mavx;-mavx2;-mfma;-fno-rtti
-- Linking flags used: 
-- Macros defined: 
-- Building interface folder 'python'
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /root/fastLLaMa/build
Setup executing command: make -j 4
[  7%] Building C object CMakeFiles/ggml_library.dir/lib/ggml.c.o
cc1: warning: command line option ‘-fno-rtti’ is valid for C++/D/ObjC++ but not for C
[ 15%] Linking C static library libggml_library.a
[ 15%] Built target ggml_library
[ 23%] Building CXX object CMakeFiles/fast_llama_lib.dir/lib/llama.cpp.o
[ 30%] Building CXX object CMakeFiles/fast_llama_lib.dir/lib/bridge.cpp.o
In file included from /root/fastLLaMa/lib/llama.cpp:1:
/root/fastLLaMa/include/llama.hpp:57:76: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
   57 |         bool init(HyperParams const& params, Logger const& logger = Logger{});
      |                                                                            ^
In file included from /root/fastLLaMa/include/llama.hpp:11,
                 from /root/fastLLaMa/lib/llama.cpp:1:
/root/fastLLaMa/include/logger.hpp:50:9: note: ‘constexpr fastllama::Logger::Logger() noexcept’ is implicitly deleted because its exception-specification does not match the implicit exception-specification ‘noexcept (false)’
   50 |         Logger() noexcept = default;
      |         ^~~~~~
In file included from /root/fastLLaMa/lib/llama.cpp:1:
/root/fastLLaMa/include/llama.hpp:58:51: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
   58 |         void deinit(Logger const& logger = Logger{});
      |                                                   ^
/root/fastLLaMa/include/llama.hpp:123:23: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
  123 |         Logger logger{};
      |                       ^
In file included from /root/fastLLaMa/include/bridge.hpp:7,
                 from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/llama.hpp:57:76: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
   57 |         bool init(HyperParams const& params, Logger const& logger = Logger{});
      |                                                                            ^
In file included from /root/fastLLaMa/include/llama.hpp:11,
                 from /root/fastLLaMa/include/bridge.hpp:7,
                 from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/logger.hpp:50:9: note: ‘constexpr fastllama::Logger::Logger() noexcept’ is implicitly deleted because its exception-specification does not match the implicit exception-specification ‘noexcept (false)’
   50 |         Logger() noexcept = default;
      |         ^~~~~~
In file included from /root/fastLLaMa/include/bridge.hpp:7,
                 from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/llama.hpp:58:51: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
   58 |         void deinit(Logger const& logger = Logger{});
      |                                                   ^
/root/fastLLaMa/include/llama.hpp:123:23: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
  123 |         Logger logger{};
      |                       ^
In file included from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/bridge.hpp:31:27: error: use of deleted function ‘constexpr fastllama::Logger::Logger()’
   31 |             Logger logger{};
      |                           ^
In file included from /root/fastLLaMa/lib/bridge.cpp:1:
/root/fastLLaMa/include/bridge.hpp: In static member function ‘static fastllama::FastLlama::Params fastllama::FastLlama::builder()’:
/root/fastLLaMa/include/bridge.hpp:74:52: error: could not convert ‘<brace-enclosed initializer list>()’ from ‘<brace-enclosed initializer list>’ to ‘fastllama::FastLlama::Params’
   74 |         static Params builder() noexcept { return {}; }
      |                                                    ^
      |                                                    |
      |                                                    <brace-enclosed initializer list>
/root/fastLLaMa/lib/bridge.cpp: In member function ‘std::optional<fastllama::FastLlama> fastllama::FastLlama::Params::build(std::string_view, const string_view&)’:
/root/fastLLaMa/lib/bridge.cpp:146:34: error: could not convert ‘{<expression error>}’ from ‘<brace-enclosed initializer list>’ to ‘std::optional<fastllama::FastLlama>’
  146 |         return { std::move(temp) };
      |                                  ^
      |                                  |
      |                                  <brace-enclosed initializer list>
make[2]: *** [CMakeFiles/fast_llama_lib.dir/build.make:76: CMakeFiles/fast_llama_lib.dir/lib/llama.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/fast_llama_lib.dir/build.make:90: CMakeFiles/fast_llama_lib.dir/lib/bridge.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:165: CMakeFiles/fast_llama_lib.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

I could make it work with python3 setup.py only, but it doesn't seems to fix the problem because while running the command to quantize the model, I get this output.

(env) root@llama:~/fastLLaMa# python3 quantize.py 7B
The "./src/quantize" script was not found in the current location.
If you want to use it from another location, set the --quantize-script-path argument from the command line.
(env) root@llama:~/fastLLaMa#

Edit

I updated my VM to Ubuntu 22.04, which also updated G++ to 11.3.0, and recloned the repository, and fixed the issue.
However, running the example.py file in examples/python seems to not work for me.

root@llama:~/fastLLaMa/examples/python# python3 example.py
Traceback (most recent call last):
  File "/root/fastLLaMa/examples/python/example.py", line 15, in <module>
    model = Model(
  File "/root/fastLLaMa/examples/python/build/fastllama.py", line 95, in __init__
    self.lib = ctypes.cdll.LoadLibrary(library_path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: build/interfaces/python/pyfastllama.so: cannot open shared object file: No such file or directory
Exception ignored in: <function Model.__del__ at 0x7f2eb0e0af80>
Traceback (most recent call last):
  File "/root/fastLLaMa/examples/python/build/fastllama.py", line 223, in __del__
    lib = self.lib
AttributeError: 'Model' object has no attribute 'lib'
root@llama:~/fastLLaMa/examples/python#

from fastllama.

amitsingh19975 commented on May 25, 2024

We currently use relative paths, but we will make them into absolute paths. Run them from workspace root dir.

python ./examples/python/example.py

Otherwise, you can provide the absolute parent path to the pyfastllama.so inside the build/interfaces/python

from fastllama.

PotatoSpudowski commented on May 25, 2024

Sorting this out soon!

from fastllama.

Showdown76py commented on May 25, 2024

Hello, sorry, I forgot to reply.
I was able to make it work, and now everything works correctly. The answer is being generated even faster than the ingestion now! Thank you very much, I am closing this issue.

from fastllama.

PotatoSpudowski commented on May 25, 2024

@Showdown76py try increasing the n_batch parameter for faster ingestion

model = Model(
        id=ModelKind.ALPACA_LORA_7B,
        path=MODEL_PATH, #path to model
        num_threads=16, #number of threads to use
        n_ctx=512, #context size of model
        last_n_size=16, #size of last n tokens (used for repetition penalty) (Optional)
        n_batch=128,
    )

It will increase memory consumption tho!

from fastllama.

AVX2 performance issue about fastllama HOT 17 CLOSED

Comments (17)

Edit

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent