Comments (9)
Also, on A100/L4 where it is working without runtime errors we have a different runtime error with other input sizes:
aten.index_put_(buf4, [reinterpret_tensor(buf0, (1, 1, s2*s3, 14 + s2, 14 + s3), (0, 0, 196 + (14*s2) + (14*s3) + (s2*s3), 14 + s3, 1), 0)], buf6, False)
RuntimeError: nonzero is not supported for tensors with more than INT_MAX elements, See https://github.com/pytorch/pytorch/issues/51871
from pytorch.
Could you share a repro or a full error trace?
from pytorch.
As I mentioned the repro is at #121504
What is a full error trace to debug this?
from pytorch.
The full stack trace that came with the RuntimeError: shape mismatch:
from pytorch.
On the H100 with pytorch nightly
Traceback (most recent call last):
File "/workspace/tools/eval.py", line 135, in <module>
main()
File "/workspace/tools/eval.py", line 130, in main
main_worker(0, cfg, enable_amp=args.amp)
File "/workspace/tools/eval.py", line 30, in main_worker
evaluator.evaluating()
File "/workspace/networks/managers/evaluator.py", line 442, in evaluating
engine.add_reference_frame(current_img,
File "/workspace/networks/engines/aotv3_engine.py", line 648, in add_reference_frame
aot_engine.add_reference_frame(img,
File "/workspace/networks/engines/aotv3_engine.py", line 239, in add_reference_frame
self.curr_lstt_output = self.AOT.LSTT_forward(curr_enc_embs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/networks/models/aotv3.py", line 189, in LSTT_forward
lstt_embs, lstt_memories = self.MSLSTT(curr_embs, long_term_memories,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 414, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/workspace/networks/layers/transformer.py", line 581, in forward
output, memories = layer(output,
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/networks/layers/transformer.py", line 753, in forward
def forward(self,
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 414, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/workspace/networks/layers/attention.py", line 310, in forward
@torch.compile
File "/workspace/networks/layers/attention.py", line 344, in torch_dynamo_resume_in_forward_at_344
qk = self.correlation_sampler(q, k).view(
File "/opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 548, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 998, in forward
return compiled_fn(full_args)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 203, in runtime_wrapper
all_outs = call_func_at_runtime_with_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/utils.py", line 118, in call_func_at_runtime_with_args
out = normalize_as_list(f(args))
^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 434, in wrapper
return compiled_fn(runtime_args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 1078, in __call__
return self.current_callable(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 927, in run
return model(new_inputs)
^^^^^^^^^^^^^^^^^
File "/tmp/torchinductor_root/ut/cutmbnzthsr64p23ilpnn2ym54twqj4lwpqj5v3shylgqucshcur.py", line 660, in call
aten.index_put_(buf6, [reinterpret_tensor(buf0, (1, 1, 5244, 90, 83), (0, 0, 7552, 83, 1), 0)], reinterpret_tensor(buf7, (1179900, ), (1, ), 0), False)
File "/opt/conda/lib/python3.11/site-packages/torch/_ops.py", line 1060, in __call__
return self_._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape mismatch: value tensor of shape [1179900] cannot be broadcast to indexing result of shape [1165408]
from pytorch.
@ezyang For the problem instead on the A100/L4 do you know where index_put
is going to require nonzero
op?
Is it derived from:
pytorch/aten/src/ATen/native/TensorAdvancedIndexing.cpp
Lines 8 to 9 in 0756f9f
from pytorch.
When buf0 is a boolean mask, this results in data-dependent compute (nonzero call) because we must determine all the True entries in the boolean mask to determine which entries we write to. If buf0 is integer indices this is not needed.
from pytorch.
Ok so in this case we are going to hit again NVIDIA/cccl#1422 for some specific inputs.
But do you know instead what it is happening on H100 ?
from pytorch.
Tested again on 20240530
nightly the error #126614 (comment) on H100 is still there.
from pytorch.
Related Issues (20)
- Document the torch.onnx.symbolic_opset9.cat function HOT 1
- Document the torch.utils.collect_env.get_env_info function HOT 1
- Document the torch.utils.data.datapipes.utils.decoder.basichandlers function HOT 2
- Document the torch.onnx.symbolic_opset9.sigmoid function HOT 1
- Document the torch.distributed.elastic.utils.distributed.get_free_port function HOT 1
- Document the torch.fx.traceback.preserve_node_meta function HOT 1
- Document the torch.fx.operator_schemas.create_type_hint function HOT 1
- Document the torch.cuda.profiler.start function HOT 1
- Document the torch.cuda.profiler.stop function HOT 1
- Pytorch build from source failed with GCC 12.3
- Model parameter and gradient memory formats are inconsistent with compiled autograd
- recent torchinductor changes seems to break torchao CI HOT 1
- Ability to do aot/inductor compilation from a jit model (or torch.exported model) HOT 1
- xpu: support torch.xpu.<memory> ops (memory_allocated, max_memory_allocated, etc.)
- Triton codegen for ops.masked with an inputbuffer can be improved
- xpu: a set of foreach ops not implemented for XPU backend affecting Huggingface examples
- Add SinusoidalPositionalEmbedding module for use in Transformers and Diffusion models
- xpu: aten::nll_loss2d_* not implemented for XPU backend affecting Huggingface examples
- xpu: set of unimplemented ops affect huggingface examples performance
- Build Error when Compiling From Source on Mac OS Ventura 13.0 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.