Comments (3)
Those changes are currently only on main, did you install TRL from source?
from trl.
Would you like to give #1456 a try?
python examples/scripts/dpo.py \
--dataset_name=trl-internal-testing/hh-rlhf-trl-style \
--model_name_or_path=gpt2 \
--per_device_train_batch_size 4 \
--max_steps 1000 \
--learning_rate 1e-3 \
--gradient_accumulation_steps 1 \
--logging_steps 10 \
--eval_steps 500 \
--output_dir="dpo_anthropic_hh" \
--warmup_steps 150 \
--report_to wandb \
--bf16 \
--logging_first_step \
--no_remove_unused_columns
from trl.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
from trl.
Related Issues (20)
- misleading warning message HOT 1
- How should I set the SFT label? HOT 1
- dpo cli command error HOT 5
- [question] Differences in hh-rlhf datasets between versions 0.8.1 and 0.8.2.dev0 HOT 1
- DPOTrainer tokenization fails after 30 minutes HOT 1
- Feature Request: DNO HOT 1
- a bug which leads to "Cuda: Out of memory" in CPOTrainer (cpo_trainer.py), trl 0.8.1 0.8.2, please fix this bug
- Why PPO/DDPO inherits the BaseTrainer class instead of the trainer class?
- setting compute metrics on SFTTrainer but "RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 102 but got size 212 for tensor number 1 in the list." HOT 1
- [Question] ORPOTrainer and CPOTrainer looks very similar HOT 1
- Bug in example DPO script in dataloading
- Speed up ZeRO-3 generation with DPO HOT 6
- [DPO/KTO] Mixtral Load Balancing Loss HOT 6
- [question] is it fine to only use step of ppo_trainer, without ppo_trainer.generate ? HOT 1
- User turn masking is not robust with the DataCollatorForCompletionOnlyLM HOT 3
- Cannot train custom `PeftModelForCausalLM` model with `PPOTrainer` HOT 1
- KTOTrainer fails to compute loss when model is loaded across multiple GPUs HOT 4
- Support for training LLM's using RLAIF methods HOT 2
- SFT - Dataset Packing causes C10d timeout
- TrlParser not respecting config
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trl.