Giter Site home page Giter Site logo

Comments (11)

Bellk17 avatar Bellk17 commented on June 8, 2024 5

Found fix; for me it was an issue for running benchmark test from the source directory. During installation, the _C module is compiled into the site-packages directory of the pip installation. When running from the source directory, the script is getting the code from source and not the installed package containing the compiled module.

Command:

  • FROM vllm dir
  • python3 benchmarks/benchmark_throughput.py --input-len=50 --output-len=100 --enforce-eager --tensor-parallel-size=6

Error:
... File "~/workspace/vllm/vllm/_custom_ops.py", line 176, in reshape_and_cache vllm_cache_ops.reshape_and_cache(key, value, key_cache, value_cache, NameError: name 'vllm_cache_ops' is not defined (Caught error: No module named 'vllm._C')

The script is picking up on the local module at ~/workspace/vllm/vllm instead of the installed module. Running the command from a different directory, such as the benchmarks directory, fixes this.

@yananchen1989 I notice your stack trace is also coming from source, /home/chenyanan/vllm/vllm/_custom_ops.py, try running from a separate directory after installing / compiling source. Let me know if this fixes the issue.

That being said, the try / except imports are causing unhelpful stack traces; I will look into doing an audit of compiled modules and adding useful warnings when not detected.

from vllm.

peterauyeung avatar peterauyeung commented on June 8, 2024 4

Found fix; for me it was an issue for running benchmark test from the source directory. During installation, the _C module is compiled into the site-packages directory of the pip installation. When running from the source directory, the script is getting the code from source and not the installed package containing the compiled module.

Command:

  • FROM vllm dir
  • python3 benchmarks/benchmark_throughput.py --input-len=50 --output-len=100 --enforce-eager --tensor-parallel-size=6

Error: ... File "~/workspace/vllm/vllm/_custom_ops.py", line 176, in reshape_and_cache vllm_cache_ops.reshape_and_cache(key, value, key_cache, value_cache, NameError: name 'vllm_cache_ops' is not defined (Caught error: No module named 'vllm._C')

The script is picking up on the local module at ~/workspace/vllm/vllm instead of the installed module. Running the command from a different directory, such as the benchmarks directory, fixes this.

@yananchen1989 I notice your stack trace is also coming from source, /home/chenyanan/vllm/vllm/_custom_ops.py, try running from a separate directory after installing / compiling source. Let me know if this fixes the issue.

That being said, the try / except imports are causing unhelpful stack traces; I will look into doing an audit of compiled modules and adding useful warnings when not detected.

Confirm this is correct. I just need to cd out of the source and able to run without the error

from vllm.

cybrtooth avatar cybrtooth commented on June 8, 2024

I just received this error as well. Seems to only happen on non-quantized mistral-7B models.

from vllm.

yananchen1989 avatar yananchen1989 commented on June 8, 2024

using langchain can works, as an alternative.

https://python.langchain.com/docs/integrations/llms/vllm/

from langchain_community.llms import VLLM

llm_vllm = VLLM(model='mistralai/Mistral-7B-Instruct-v0.2',
           trust_remote_code=True,  # mandatory for hf models
           max_new_tokens=2048,
           temperature=1,
           # tensor_parallel_size=... # for distributed inference
)

from vllm.

Bellk17 avatar Bellk17 commented on June 8, 2024

I'm seeing the same issue.

Catching the import error gives:
No module named 'vllm._C'

Also seeing warnings during install:

...
CMake Warning at /home/tensorwave/install_vllm/venv/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /home/tensorwave/install_vllm/venv/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:67 (find_package)


CMake Warning at CMakeLists.txt:124 (message):
  Pytorch version 2.1.1 expected for ROCMm 6.x build, saw 2.4.0 instead.


-- HIP supported arches: gfx906;gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100
-- HIP target arches: gfx942;gfx942;gfx942;gfx942;gfx942;gfx942;gfx942;gfx942
CMake Warning at CMakeLists.txt:266 (message):
  Unable to create _punica_C target because none of the requested
  architectures (gfx942;gfx942;gfx942;gfx942;gfx942;gfx942;gfx942;gfx942) are
  supported, i.e.  >= 8.0
...

Works when TP is not set.

Currently trying to get working with MI300x on ROCm 6.1.

from vllm.

leiwen83 avatar leiwen83 commented on June 8, 2024

tests/ folder also suffer from this vllm_ops not defined issue.

And I create this PR for pytest, #4231, which force pytest to search module from installed place.

from vllm.

dagelf avatar dagelf commented on June 8, 2024

I get this error after doing a clean install with pip install -e . of commit 26f2fb5. There were no errors during the installation... but this runtime error might be because I used too new dependencies (Python 3.10.12, Pytorch 2.3, Cuda 12.4)?

from vllm.

chrisociepa avatar chrisociepa commented on June 8, 2024

it looks like Pytorch 2.3.0 causes the problem

from vllm.

Semihal avatar Semihal commented on June 8, 2024

I have this error:

INFO 05-22 14:25:08 utils.py:660] Found nccl from library /lib64/libnccl.so.2
INFO 05-22 14:25:09 selector.py:81] Cannot use FlashAttention-2 backend because the flash_attn package is not found. Please install it for better performance.
INFO 05-22 14:25:09 selector.py:32] Using XFormers backend.
INFO 05-22 14:25:34 model_runner.py:175] Loading model weights took 13.5516 GB
[rank0]: Traceback (most recent call last):
[rank0]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]:   File "<frozen runpy>", line 88, in _run_code
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 168, in <module>
[rank0]:     engine = AsyncLLMEngine.from_engine_args(
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 366, in from_engine_args
[rank0]:     engine = cls(
[rank0]:              ^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 324, in __init__
[rank0]:     self.engine = self._init_engine(*args, **kwargs)
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 442, in _init_engine
[rank0]:     return engine_class(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 172, in __init__
[rank0]:     self._initialize_kv_caches()
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 249, in _initialize_kv_caches
[rank0]:     self.model_executor.determine_num_available_blocks())
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/executor/gpu_executor.py", line 106, in determine_num_available_blocks
[rank0]:     return self.driver_worker.determine_num_available_blocks()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks
[rank0]:     self.model_runner.profile_run()
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 888, in profile_run
[rank0]:     self.execute_model(seqs, kv_caches)
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 808, in execute_model
[rank0]:     hidden_states = model_executable(**execute_model_kwargs)
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 316, in forward
[rank0]:     hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 253, in forward
[rank0]:     hidden_states, residual = layer(
[rank0]:                               ^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 202, in forward
[rank0]:     hidden_states = self.input_layernorm(hidden_states)
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/model_executor/layers/layernorm.py", line 60, in forward
[rank0]:     ops.rms_norm(
[rank0]:   File "/usr/local/lib64/python3.11/site-packages/vllm/_custom_ops.py", line 106, in rms_norm
[rank0]:     vllm_ops.rms_norm(out, input, weight, epsilon)
[rank0]:     ^^^^^^^^
[rank0]: NameError: name 'vllm_ops' is not defined

This work for me:

pip install https://github.com/vllm-project/vllm/releases/download/v0.4.2/vllm-0.4.2-cp311-cp311-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu121

from vllm.

dong-liuliu avatar dong-liuliu commented on June 8, 2024

Found fix; for me it was an issue for running benchmark test from the source directory. During installation, the _C module is compiled into the site-packages directory of the pip installation. When running from the source directory, the script is getting the code from source and not the installed package containing the compiled module.

Command:

  • FROM vllm dir
  • python3 benchmarks/benchmark_throughput.py --input-len=50 --output-len=100 --enforce-eager --tensor-parallel-size=6

Error: ... File "~/workspace/vllm/vllm/_custom_ops.py", line 176, in reshape_and_cache vllm_cache_ops.reshape_and_cache(key, value, key_cache, value_cache, NameError: name 'vllm_cache_ops' is not defined (Caught error: No module named 'vllm._C')

The script is picking up on the local module at ~/workspace/vllm/vllm instead of the installed module. Running the command from a different directory, such as the benchmarks directory, fixes this.

@yananchen1989 I notice your stack trace is also coming from source, /home/chenyanan/vllm/vllm/_custom_ops.py, try running from a separate directory after installing / compiling source. Let me know if this fixes the issue.

That being said, the try / except imports are causing unhelpful stack traces; I will look into doing an audit of compiled modules and adding useful warnings when not detected.

I also met this error. And it was fixed after changing working directory out of the vllm source code directory.
If anyone's error stack trace has shown with your specific path, then try to 'cd' out.

Probably many of us have a same habit to directly run tests or getting-start at the source code dir :)

from vllm.

DarkLight1337 avatar DarkLight1337 commented on June 8, 2024

Fixed by #5009

from vllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.