Giter Site home page Giter Site logo

python trans_web_demo.py多卡运行提示错误,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat) about glm-4 HOT 8 CLOSED

fredliu168 avatar fredliu168 commented on August 10, 2024
python trans_web_demo.py多卡运行提示错误,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

from glm-4.

Comments (8)

hiyouga avatar hiyouga commented on August 10, 2024 3

更新这个文件:https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

from glm-4.

ashmalvayani avatar ashmalvayani commented on August 10, 2024

Updating the modeling_chatglm.py file did not solve the error. If I run the model on two gpus, it's still throwing the same error. If I run this on a single GPU, it gives the following error:

def forward(
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 3905, in forward
    return compiled_fn(full_args)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1482, in g
    return f(*args)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2533, in runtime_wrapper
    all_outs = call_func_with_args(
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1506, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1594, in rng_functionalization_wrapper
    return compiled_fw(args)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 374, in __call__
    return self.get_current_callable()(inputs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 401, in _run_from_cache
    return compiled_graph.compiled_artifact(inputs)
  File "/tmp/torchinductor_ashmal.vayani/r5/cr5ok37tr3sngvqz753xvfrrbhakfkffw35lawlukvvbfuy6p2tb.py", line 109, in call
    triton_poi_fused_embedding_1.run(arg1_1, arg0_1, buf1, 200704, grid=grid(200704), stream=stream0)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 401, in run
    self.autotune_to_one_config(*args, grid=grid)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 326, in autotune_to_one_config
    timings = self.benchmark_all_configs(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 302, in benchmark_all_configs
    timings = {
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 303, in <dictcomp>
    launcher: self.bench(launcher, *args, **kwargs)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 282, in bench
    return do_bench(kernel_call, rep=40, fast_flush=True)
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/utils.py", line 75, in do_bench
    return triton_do_bench(*args, **kwargs)[0]
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/triton/testing.py", line 104, in do_bench
    fn()
  File "/home/ashmal.vayani/anaconda3/envs/arabicmmlu_fsdp2/lib/python3.10/site-packages/torch/_inductor/triton_heuristics.py", line 276, in kernel_call
    launcher(
  File "<string>", line 13, in launcher
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

from glm-4.

a624090359 avatar a624090359 commented on August 10, 2024

更新这个文件:https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

我用这个方法成功了,原先的问题是出在哪了?

from glm-4.

ashmalvayani avatar ashmalvayani commented on August 10, 2024

更新这个文件:https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

我用这个方法成功了,原先的问题是出在哪了?

你到底做了什么来解决这个问题?我是在提到这个问题后下载的模型,所以我应该下载了正确的模型。

from glm-4.

qihanghou726 avatar qihanghou726 commented on August 10, 2024

我遇到了相同的问题,我是通过修改GLM-4/composite_demo/src/clients/hf.py文件夹中class HFClient(Client):
def init(self, model_path: str):
self.tokenizer = AutoTokenizer.from_pretrained(
model_path, trust_remote_code=True,
)
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16,

device_map="cuda",

#修改成可以多卡跑
device_map="auto",
).eval()
这一段将device_map="cuda",=>device_map="auto",然后模型就可以在多张GPU上跑起来,但是我也更新了
https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py
这个文件,对我来说不起效果,请问您是怎么让模型在多张GPU上跑起来的?

from glm-4.

ashmalvayani avatar ashmalvayani commented on August 10, 2024

https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py

你在这个文件中更新了什么?正常推理工作正常,但我试图在 https://github.com/mbzuai-nlp/ArabicMMLU 上评估这个模型,但一旦加载模型,它就会在多个 GPU 和单个 GPU 上抛出错误。

我遇到了相同的问题,我是通过修改GLM-4/composite_demo/src/clients/hf.py文件夹中class HFClient(Client): def init(self, model_path: str): self.tokenizer = AutoTokenizer.from_pretrained( model_path, trust_remote_code=True, ) self.model = AutoModelForCausalLM.from_pretrained( model_path, trust_remote_code=True, torch_dtype=torch.bfloat16,

device_map="cuda",

#修改成可以多卡跑 device_map="auto", ).eval() 这一段将device_map="cuda",=>device_map="auto",然后模型就可以在多张GPU上跑起来,但是我也更新了 https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py 这个文件,对我来说不起效果,请问您是怎么让模型在多张GPU上跑起来的?

from glm-4.

qihanghou726 avatar qihanghou726 commented on August 10, 2024

我并没有修改https://huggingface.co/THUDM/glm-4-9b-chat/blob/main/modeling_chatglm.py这个文件
我是通过修改GLM-4/composite_demo/src/clients/hf.py这个文件来使模型可以在多张GPU上加载,但是在推理的时候会报错这个问题。

from glm-4.

ashmalvayani avatar ashmalvayani commented on August 10, 2024

但在 github 上修改它不能解决一般问题,比如直接用 huggingface 加载

from glm-4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.