Comments (1)
It looks like you are seeing out-of-memory (OOM) because your activation size is too large, which is not directly related to FSDP:
File "/project/p_trancal/trsclbjob/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1126, in forward
attn_mask = F._canonical_mask(
File "/project/p_trancal/trsclbjob/lib/python3.10/site-packages/torch/nn/functional.py", line 5115, in _canonical_mask
torch.zeros_like(mask, dtype=target_type)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 29.07 GiB. GPU 3 has a total capacity of 39.43 GiB of which 25.15 GiB is free. Including non-PyTorch memory, this process has 14.27 GiB memory in use. Of the allocated memory 11.74 GiB is allocated by PyTorch, and 932.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
You may want to check your input activation sizes since you are trying to allocate a 29.07 GiB attn_mask
.
from pytorch.
Related Issues (20)
- Backend Error in PyTorch.torch.linalg.lstsq HOT 2
- Inaccurate filename due to test class metaprogramming that doesn't set __file__ and __name__ HOT 3
- Build failure when using CUDAGraph HOT 5
- [inductor] TorchInductor does not correctly recognize the grad status of model code HOT 3
- [dynamo] nn module parameterization can cause excessive restarts HOT 2
- got Unexpected error from cudaGetDeviceCount() after calling torch.cuda.is_available() HOT 2
- [Compiled autograd] Support user-defined triton kernel for which speculate_subgraph on bw failed
- torch_randn_like function potentially giving different results with torch.manual_seed()? HOT 1
- Error calling to_sparse_coo() on any subclass of Tensor
- C++ libtorch x86_64 Darwin deprecation note missing
- [NestedTensor] chunk fails under DEBUG=1 builds HOT 1
- DISABLED test_ring_attention_compile_attention_fn0 (__main__.RingAttentionTest) HOT 1
- DISABLED test_full_symbolic_value_cuda (__main__.TestInductorDynamicCUDA) HOT 1
- Eager/FX backend sometimes uses more memory than eager HOT 1
- scheduler.step()
- Dynamic shapes: Cannot determine contiguity of split on inner dimension HOT 2
- Inductor generated Triton kernel spends double time from Llama2 to Llama 3 HOT 1
- `Dockerfile` should set the `syntax` directive to v1 HOT 2
- [DCP] DCP load for non-tensor values HOT 1
- Support loading with map_location on xpu backend HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.