Comments (6)
I'm reasonably confident this is because SFTTrainer seems to have changed how it expects args to be passed.
SFT_config must be used.
Also, it seems that peft_config cannot be passed within SFT_config. See here.
It took me a while to track this down but it appeared in an error message.
Kind of annoying if it is this, because a lot of my SFT scripts have now broken (and I didn't have frozen versioning because I was using the dev version of SFT in quite a few cases).
from transformers.
Hi @KaifAhmad1, in order to be able to help we need a full reproducer i.e. something we can copy, paste and run to get the same error. Without the model and dataset (or public equivalents) or peft config there's not much we can do here
from transformers.
Hey, @amyeroberts I cannot share the full code here. Here is the link of colab notebook so you can refer.
https://github.com/KaifAhmad1/code-test/blob/main/Phi_3_Fine_Tuned_on_Indic_Lanuage.ipynb
from transformers.
Hi @KaifAhmad1
Can you try to force-set gradient_checkpointing
:
# Training Arguments
training_arguments = TrainingArguments(
output_dir="./results",
num_train_epochs=1,
per_device_train_batch_size=4,
gradient_accumulation_steps=1,
optim="paged_adamw_32bit",
save_steps=0,
logging_steps=25,
learning_rate=2e-4,
weight_decay=0.001,
fp16=True,
bf16=False,
max_grad_norm=0.3,
max_steps=-1,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="cosine",
report_to="tensorboard",
+ gradient_checkpointing=True
)
# SFTTrainer Arguments
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
peft_config=peft_config,
dataset_text_field='text',
args=training_arguments,
tokenizer=tokenizer,
packing=False,
max_seq_length=None
)
from transformers.
Thanks, @younesbelkada for fixing it!
from transformers.
Thanks @RonanKMcGovern for the feedback, indeed the SFTConfig might have broken things .. I will look into that, can you help me providing more details here on what are the silent bugs you faced ? In our CI where we extensively test many things, all seemed green for us, this will help improve our testing infra
Can you elaborate on why peft_config cannot be passed within SFTConfig ?
from transformers.
Related Issues (20)
- DDP error with load_best_model_at_end enabled
- Error while moving model to GPU `NotImplementedError: Cannot copy out of meta tensor; no data!` HOT 4
- KV cache with CPU offloading HOT 4
- Refusal rejection removal as a feature
- Add static cache support for Whisper HOT 3
- from_pretrained torch_dtype DO NOT affect model buffers
- Error with tf-keras when trying to geneate random seeds HOT 1
- Error while runing T5 trainer: TypeError: argument 'ids': 'list' object cannot be interpreted as an integer HOT 2
- Is `model. generate` supported during the training process?
- CLIPProcessor is not loading the saved Processor of the same version HOT 12
- Failed to Download GPT2-large Model from Hub
- Add TableTransformerImageProcessor HOT 3
- error when convert llama1 ckpts to hf formath HOT 5
- `hub_strategy="every_save"` won't push the model to the Hub if large
- Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer HOT 2
- AttributeError: 'HQQLinear' object has no attribute 'weight' HOT 8
- Assisted model doesn't seem to be working for Meta-Llama-3-8B HOT 2
- Mixtral past_key_values and output_router_logits incompatible HOT 1
- Disable Progress Bar? HOT 1
- Meet problems when I use the file src/transformers/models/llama/convert_llama_weights_to_hf.py to transfer LlaMa-7B HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.