Comments (6)
@zucchini-nlp In my experience the spikes are hardware-dependent, even when two devices have the same spare memory available.
@songh11 "You may might also notice that the second time we run our model with torch.compile is significantly slower than the other runs, although it is much faster than the first run. This is because the "reduce-overhead" mode runs a few warm-up iterations for CUDA graphs." (source)
from transformers.
Hi @songh11 👋
If you check the documentation regarding torch.compile
, especially relative to the "reduce-overhead" flag, you'll see an explanation :)
from transformers.
Interestingly I didn't get sudden memory spike after second generation and after 5 steps the memory remained around 16GB 🤔 . My specs are:
PyTorch version: 2.3.0+cu121
CUDA used to build PyTorch: 12.1
OS: Ubuntu 20.04.6 LTS (x86_64)
GPU: NVIDIA A100-SXM4-80GB
from transformers.
Interestingly I didn't get sudden memory spike after second generation and after 5 steps the memory remained around 16GB 🤔 . My specs are:
PyTorch version: 2.3.0+cu121 CUDA used to build PyTorch: 12.1 OS: Ubuntu 20.04.6 LTS (x86_64) GPU: NVIDIA A100-SXM4-80GB
NVIDIA RTX A5000, I think the second generation is also for warm-up.
from transformers.
Hi @songh11 👋
If you check the documentation regarding
torch.compile
, especially relative to the "reduce-overhead" flag, you'll see an explanation :)
Many thanks, another question I have is why does the second generate use more memory
from transformers.
@zucchini-nlp In my experience the spikes are hardware-dependent, even when two devices have the same spare memory available.
@songh11 "You may might also notice that the second time we run our model with torch.compile is significantly slower than the other runs, although it is much faster than the first run. This is because the "reduce-overhead" mode runs a few warm-up iterations for CUDA graphs." (source)
Thank you for your reply. I can use default to pass.
from transformers.
Related Issues (20)
- Inconsistent special_token addition in EncoderDecoderModel forward pass
- Cannot find the best model after training HOT 1
- MPS support broken for T5 models HOT 1
- Pass `HFQuantizer` to `from_pretrained` kwargs HOT 1
- [i18n-<languageCode>] Translating docs to <languageName> HOT 1
- NumPy 2.0 support HOT 1
- Can I use "attn_implementation" in model config file HOT 3
- Encountering an error while loading a model using state_dict and quantization simultaneously HOT 6
- Fix 'Can't infer missing attention mask on `mps` device' HOT 2
- might be a waste of resources HOT 1
- Tensors' device passed to a model is not correct when ACCELERATE_TORCH_DEVICE is privateuseone
- Suport sdpa for RoBERTa and XLM-RoBERTa models
- Converting gguf fp16 & bf16 to hf is not supported. HOT 5
- Dead code, `cache_kwargs` HOT 1
- The conversion of the llama3 model back from gguf seems weird. HOT 3
- Train on logits instead of one hot vectors
- 'tf_keras' has no attribute 'activations' HOT 4
- Bug in whisper word-level timestamps (`tokenizer._decode_asr`)
- RobertaForClassification throws an error because of dimension mismatch
- Fix Bug: Gemma2 the `past_key_value.update()` function has added a new parameter "sliding_window" to support the `_sliding_update` function.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.