Comments (2)
I think we need to add a codepath in Trainer
to enable gradient checkpointing for torch_xla using this API
from transformers.
I think we can close this issue as discussed here: pytorch/xla#6611 (comment)
The native grad ckpt won't work in any cases. We need to apply some special treatment for the XLA compiler to make grad ckpt working. Those are implemented here: https://github.com/pytorch/xla/blob/master/torch_xla/utils/checkpoint.py
However, I do think we should able to integrate our grad ckpt into HF such that users don't need to specify xla_fsdp_grad_ckpt in FSDP anymore
from transformers.
Related Issues (20)
- Cannot convert llama 3 model to hf HOT 2
- error when using PPO in Gemma HOT 9
- Llama3 models causing `TypeError: not a string` error in LlamaTokenizer HOT 4
- Some functional problems in the implementation of Speculative Decoding HOT 3
- Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification HOT 1
- Whisper assistant decoding not working with pipeline
- Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification HOT 1
- TypeError: WhisperForConditionalGeneration.forward() got an unexpected keyword argument 'model' HOT 5
- FutureWarning about resume_download is raised after huggingface-hub 0.23.0 release
- Remove pipelines, chatformatters, templates etc --> Replace with simple generator function / manual string interpolation ---> Just have one standardized way for building datasets and running inference HOT 2
- HTML Files Keep on Loading HOT 1
- Wav2Vec2ForCTC weight mismatch HOT 1
- More memory consumption than litgpt
- Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' HOT 5
- DPT implementation contains unused parameters HOT 4
- Urdu Encoding Issue in Hugging Face Tokenizer HOT 1
- Add Prismatic VLMs to Transformers HOT 3
- Error converting from PyTorch to HuggingFace - Mistral / Mixtral
- model_max_length default parameters are missing in transformers>=4.40.0 HOT 2
- (Have PR) Speed up `BeamScorer` to make GPT-2 generation 2-3x faster HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.