🐛 Describe the bug Recently compiling <a class="issue-link js-iss

As I mentioned the repro is at <a class="issue-link js-issue-link" data-error-text="Fa

On the H100 with pytorch nightly <div class="highlight highlight-source-python not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ok so in this case we are going to hit again <a class="issue-link js-issue-link" data-

Tested again on 20240530 nightly the error <a class="

[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100 about pytorch HOT 9 OPEN

bhack commented on June 11, 2024

[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100

from pytorch.

Comments (9)

bhack commented on June 11, 2024

Also, on A100/L4 where it is working without runtime errors we have a different runtime error with other input sizes:

aten.index_put_(buf4, [reinterpret_tensor(buf0, (1, 1, s2*s3, 14 + s2, 14 + s3), (0, 0, 196 + (14*s2) + (14*s3) + (s2*s3), 14 + s3, 1), 0)], buf6, False)

RuntimeError: nonzero is not supported for tensors with more than INT_MAX elements,    See https://github.com/pytorch/pytorch/issues/51871

from pytorch.

xmfan commented on June 11, 2024

Could you share a repro or a full error trace?

from pytorch.

bhack commented on June 11, 2024

As I mentioned the repro is at #121504

What is a full error trace to debug this?

from pytorch.

xmfan commented on June 11, 2024

The full stack trace that came with the RuntimeError: shape mismatch:

from pytorch.

bhack commented on June 11, 2024

On the H100 with pytorch nightly

Traceback (most recent call last):
  File "/workspace/tools/eval.py", line 135, in <module>
    main()
  File "/workspace/tools/eval.py", line 130, in main
    main_worker(0, cfg, enable_amp=args.amp)
  File "/workspace/tools/eval.py", line 30, in main_worker
    evaluator.evaluating()
  File "/workspace/networks/managers/evaluator.py", line 442, in evaluating
    engine.add_reference_frame(current_img,
  File "/workspace/networks/engines/aotv3_engine.py", line 648, in add_reference_frame
    aot_engine.add_reference_frame(img,
  File "/workspace/networks/engines/aotv3_engine.py", line 239, in add_reference_frame
    self.curr_lstt_output = self.AOT.LSTT_forward(curr_enc_embs,
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/networks/models/aotv3.py", line 189, in LSTT_forward
    lstt_embs, lstt_memories = self.MSLSTT(curr_embs, long_term_memories,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 414, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/networks/layers/transformer.py", line 581, in forward
    output, memories = layer(output,
                       ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/networks/layers/transformer.py", line 753, in forward
    def forward(self,
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 414, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/networks/layers/attention.py", line 310, in forward
    @torch.compile
  File "/workspace/networks/layers/attention.py", line 344, in torch_dynamo_resume_in_forward_at_344
    qk = self.correlation_sampler(q, k).view(
  File "/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 548, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 998, in forward
    return compiled_fn(full_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 203, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/utils.py", line 118, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
                            ^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 434, in wrapper
    return compiled_fn(runtime_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 1078, in __call__
    return self.current_callable(inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 927, in run
    return model(new_inputs)
           ^^^^^^^^^^^^^^^^^
  File "/tmp/torchinductor_root/ut/cutmbnzthsr64p23ilpnn2ym54twqj4lwpqj5v3shylgqucshcur.py", line 660, in call
    aten.index_put_(buf6, [reinterpret_tensor(buf0, (1, 1, 5244, 90, 83), (0, 0, 7552, 83, 1), 0)], reinterpret_tensor(buf7, (1179900, ), (1, ), 0), False)
  File "/opt/conda/lib/python3.11/site-packages/torch/_ops.py", line 1060, in __call__
    return self_._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [1179900] cannot be broadcast to indexing result of shape [1165408]

from pytorch.

bhack commented on June 11, 2024

@ezyang For the problem instead on the A100/L4 do you know where index_put is going to require nonzero op?
Is it derived from:

pytorch/aten/src/ATen/native/TensorAdvancedIndexing.cpp

Lines 8 to 9 in 0756f9f

    
           // The index is a TensorList containing kLong, kBool or kByte tensors or nulls. Byte 
        
           // tensors (boolean masks) are expanded to long tensors via nonzero(). Null

from pytorch.

ezyang commented on June 11, 2024

When buf0 is a boolean mask, this results in data-dependent compute (nonzero call) because we must determine all the True entries in the boolean mask to determine which entries we write to. If buf0 is integer indices this is not needed.

from pytorch.

bhack commented on June 11, 2024

Ok so in this case we are going to hit again NVIDIA/cccl#1422 for some specific inputs.

But do you know instead what it is happening on H100 ?

from pytorch.

bhack commented on June 11, 2024

Tested again on 20240530 nightly the error #126614 (comment) on H100 is still there.

from pytorch.

[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100 about pytorch HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	// The index is a TensorList containing kLong, kBool or kByte tensors or nulls. Byte
	// tensors (boolean masks) are expanded to long tensors via nonzero(). Null