System Info / 系統信息 PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubun

更新这个文件：<a href="https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/mode

更新这个文件：<a href="https://huggingface.co/THUDM/glm-4-9b-chat/b

评我并没有修改<a href="https://huggingface.co/T

python trans_web_demo.py多卡运行提示错误，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat) about glm-4 HOT 8 CLOSED

fredliu168 commented on August 10, 2024

python trans_web_demo.py多卡运行提示错误，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

from glm-4.

Comments (8)

hiyouga commented on August 10, 2024 3

更新这个文件：https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

from glm-4.

ashmalvayani commented on August 10, 2024

Updating the modeling_chatglm.py file did not solve the error. If I run the model on two gpus, it's still throwing the same error. If I run this on a single GPU, it gives the following error:

def forward(
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 3905, in forward
    return compiled_fn(full_args)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1482, in g
    return f(*args)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2533, in runtime_wrapper
    all_outs = call_func_with_args(
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1506, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1594, in rng_functionalization_wrapper
    return compiled_fw(args)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 374, in __call__
    return self.get_current_callable()(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 401, in _run_from_cache
    return compiled_graph.compiled_artifact(inputs)
  File "/tmp/torchinductor_ashmal.vayani/r5/cr5ok37tr3sngvqz753xvfrrbhakfkffw35lawlukvvbfuy6p2tb.py", line 109, in call
    triton_poi_fused_embedding_1.run(arg1_1, arg0_1, buf1, 200704, grid=grid(200704), stream=stream0)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 401, in run
    self.autotune_to_one_config(*args, grid=grid)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 326, in autotune_to_one_config
    timings = self.benchmark_all_configs(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 302, in benchmark_all_configs
    timings = {
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 303, in <dictcomp>
    launcher: self.bench(launcher, *args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 282, in bench
    return do_bench(kernel_call, rep=40, fast_flush=True)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/utils.py", line 75, in do_bench
    return triton_do_bench(*args, **kwargs)[0]
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/triton/testing.py", line 104, in do_bench
    fn()
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 276, in kernel_call
    launcher(
  File "<string>", line 13, in launcher
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

from glm-4.

a624090359 commented on August 10, 2024

更新这个文件：https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

我用这个方法成功了，原先的问题是出在哪了？

from glm-4.

ashmalvayani commented on August 10, 2024

更新这个文件：https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

我用这个方法成功了，原先的问题是出在哪了？

你到底做了什么来解决这个问题？我是在提到这个问题后下载的模型，所以我应该下载了正确的模型。

from glm-4.

qihanghou726 commented on August 10, 2024

我遇到了相同的问题，我是通过修改GLM-4/composite_demo/src/clients/hf.py文件夹中class HFClient(Client):
def init(self, model_path: str):
self.tokenizer = AutoTokenizer.from_pretrained(
model_path, trust_remote_code=True,
)
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16,

device_map="cuda",

#修改成可以多卡跑
device_map="auto",
).eval()
这一段将device_map="cuda",=>device_map="auto"，然后模型就可以在多张GPU上跑起来，但是我也更新了
https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py
这个文件，对我来说不起效果，请问您是怎么让模型在多张GPU上跑起来的？

from glm-4.

ashmalvayani commented on August 10, 2024

https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

你在这个文件中更新了什么？正常推理工作正常，但我试图在 https://github.com/mbzuai-nlp/ArabicMMLU 上评估这个模型，但一旦加载模型，它就会在多个 GPU 和单个 GPU 上抛出错误。

我遇到了相同的问题，我是通过修改GLM-4/composite_demo/src/clients/hf.py文件夹中class HFClient(Client): def init(self, model_path: str): self.tokenizer = AutoTokenizer.from_pretrained( model_path, trust_remote_code=True, ) self.model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, torch_dtype=torch.bfloat16,

device_map="cuda",

#修改成可以多卡跑 device_map="auto", ).eval() 这一段将device_map="cuda",=>device_map="auto"，然后模型就可以在多张GPU上跑起来，但是我也更新了 https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py 这个文件，对我来说不起效果，请问您是怎么让模型在多张GPU上跑起来的？

from glm-4.

qihanghou726 commented on August 10, 2024

评

我并没有修改https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py这个文件
我是通过修改GLM-4/composite_demo/src/clients/hf.py这个文件来使模型可以在多张GPU上加载，但是在推理的时候会报错这个问题。

from glm-4.

ashmalvayani commented on August 10, 2024

但在 github 上修改它不能解决一般问题，比如直接用 huggingface 加载

from glm-4.

python trans_web_demo.py多卡运行提示错误，RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat) about glm-4 HOT 8 CLOSED

Comments (8)

device_map="cuda",

device_map="cuda",

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent