Your current environment <div class="snippet-clipboard-content notranslate posit

I also try to profile each worker like this : <a href="https://docs.ray.io/en/latest/r

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0 about vllm HOT 2 OPEN

crazy-JiangDongHua commented on September 27, 2024

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0

from vllm.

Comments (2)

crazy-JiangDongHua commented on September 27, 2024

I also try to profile each worker like this : How ray support Nsight System Profiler 。But it has no effect, still no cuda hw line

# in vllm/executor/ray_gpu_executor.py:95
worker = ray.remote(
    num_cpus=0,
    num_gpus=num_gpus,
    scheduling_strategy=scheduling_strategy,
    runtime_env={ "nsight": "default"},
    **ray_remote_kwargs,
)(RayWorkerVllm).remote(self.model_config.trust_remote_code)

from vllm.

crazy-JiangDongHua commented on September 27, 2024

This is a ray problem, which has just been solved. The detailed solution is in ray-project/ray#42139 (comment)

from vllm.

Related Issues (20)

[Bug]: `ops.scaled_fp8_quant` returns wrong shape when input shape is () HOT 1
[Bug]: LLama3 LoRA load failed HOT 2
[Bug]:`vllm server` will get some error and `python3 -m vllm.entrypoints.openai.api_server` is correct HOT 2
[Bug]: internvl2-8b 提问无限循环回答 HOT 1
[Bug]: internvl2-8b提问无限循环 HOT 2
[Feature]: Why vllm cli not provide a config arg? HOT 4
Create speculative decode dynamic parallel strategy HOT 1
[Bug]: CUDA out of memory for llama3.1 70gb gptq, while in llama3 70gb gptq doesn't HOT 2
[Feature]: continuous batching for vllm.LLM HOT 3
[Bug]: Using LLM Engine to infer the MiniCPM-V-2_6 model, the result is wrong HOT 2
[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. HOT 2
[Bug]: `gemma-2-27b-it-GGUF`: `Architecture gemma2 not supported` HOT 5
[RFC]: Encoder/decoder models & feature compatibility HOT 3
[Usage]: how to use LLM class with AsyncLLMEngine HOT 2
[Installation]: git clone cutlass fails HOT 7
[Misc]: Improving VLLM KVCACHE Transfer Efficiency with NCCL P2P Communication HOT 2
[Feature]: Support block manager v2 for chunked prefill HOT 3
[Bug]: Phi-3-vision: ERROR 08-09 11:41:40 async_llm_engine.py:56] RuntimeError: stack expects each tensor to be equal size, but got [1933, 4096] at entry 0 and [2509, 4096] at entry 1 HOT 14
[Bug]: Tensor Parallel > 1 causes desc_act=True GPTQ models to give bad output on ROCm
[Usage]: Getting empty text using llm.generate of mixtral-8X7b-Instruct AWQ model HOT 1

[Bug]: nsys cannot track the cuda kernel called by the process except rank 0 about vllm HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent