Comments (1)
Fully agree.
The current data processing code (e.g., tokenize_row function in DPOTrainer:
https://github.com/huggingface/trl/blob/1d0a7ea17b8055a6850970ab59a34709d8ca494d/trl/trainer/dpo_trainer.py#L716C9-L716C21) is incompatible with the new data format of https://huggingface.co/datasets/trl-internal-testing/hh-rlhf-trl-style.
----------------------------------------------- Print out the detailed tokenized data format (using the first data sample in Anthropic HH in v0.8.2 dataset for example) --------------------------------------------
Command:
tokenized_train_dataset = tokenize_row(train_dataset[0])
print(tokenized_train_dataset.keys())
# chosen part
print(tokenized_train_dataset["chosen_input_ids"])
print(tokenized_train_dataset["chosen_attention_mask"])
print(tokenized_train_dataset["chosen_labels"])
print(tokenizer.decode(tokenized_train_dataset["chosen_input_ids"]))
Execution results:
dict_keys(['chosen_input_ids', 'chosen_attention_mask', 'chosen_labels', 'rejected_input_ids', 'rejected_attention_mask', 'rejected_labels', 'prompt_input_ids', 'prompt_attention_mask’])
[50256, 2061, 389, 617, 269, 1046, 2456, 287, 46932, 30, 7220, 25, 1867, 389, 617, 269, 1046, 2456, 287, 46932, 30, 198, 198, 562, 10167, 25, 3423, 447, 247, 82, 281, 17503, 1351, 13, 198, 198, 8021, 11, 19317, 11, 809, 26679, 11, 18824, 11, 5089, 11, 7510, 11, 21551, 11, 256, 2799, 11, 7510, 2256, 11, 7510, 21454, 11, 629, 10599, 388, 11, 40267, 11, 40107, 11, 5089, 263, 11, 7510, 12, 30041, 11, 10973, 11, 269, 2178, 38811, 11, 5089, 77, 1018, 1136, 11, 475, 400, 2305, 11, 40125, 11, 14509, 562, 11, 269, 3320, 12603, 11, 29836, 11, 43546, 11, 18314, 11, 19311, 11, 6611, 11, 266, 962, 11, 474, 1042, 11, 10973, 12, 82, 19296, 11, 22938, 378, 11, 277, 9460, 313, 11, 24506, 11, 474, 6457, 11, 474, 6457, 12, 75, 7958, 11, 37833, 11, 33526, 11, 1125, 729, 11, 329, 6988, 1352, 11, 781, 2238, 7357, 11, 9583, 1891, 11, 10816, 11, 16949, 11, 32581, 296, 578, 11, 3095, 1136, 11, 285, 1689, 447, 247, 82, 2933, 11, 277, 9460, 313, 11, 583, 1851, 11, 24506, 11, 629, 2178, 363, 11, 21551, 11, 198, 198, 7220, 25, 1867, 338, 534, 4004, 530, 30, 198, 198, 562, 10167, 25, 314, 4398, 470, 772, 1807, 546, 340, 13, 628, 50256, 50256]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 7220, 25, 1867, 389, 617, 269, 1046, 2456, 287, 46932, 30, 198, 198, 562, 10167, 25, 3423, 447, 247, 82, 281, 17503, 1351, 13, 198, 198, 8021, 11, 19317, 11, 809, 26679, 11, 18824, 11, 5089, 11, 7510, 11, 21551, 11, 256, 2799, 11, 7510, 2256, 11, 7510, 21454, 11, 629, 10599, 388, 11, 40267, 11, 40107, 11, 5089, 263, 11, 7510, 12, 30041, 11, 10973, 11, 269, 2178, 38811, 11, 5089, 77, 1018, 1136, 11, 475, 400, 2305, 11, 40125, 11, 14509, 562, 11, 269, 3320, 12603, 11, 29836, 11, 43546, 11, 18314, 11, 19311, 11, 6611, 11, 266, 962, 11, 474, 1042, 11, 10973, 12, 82, 19296, 11, 22938, 378, 11, 277, 9460, 313, 11, 24506, 11, 474, 6457, 11, 474, 6457, 12, 75, 7958, 11, 37833, 11, 33526, 11, 1125, 729, 11, 329, 6988, 1352, 11, 781, 2238, 7357, 11, 9583, 1891, 11, 10816, 11, 16949, 11, 32581, 296, 578, 11, 3095, 1136, 11, 285, 1689, 447, 247, 82, 2933, 11, 277, 9460, 313, 11, 583, 1851, 11, 24506, 11, 629, 2178, 363, 11, 21551, 11, 198, 198, 7220, 25, 1867, 338, 534, 4004, 530, 30, 198, 198, 562, 10167, 25, 314, 4398, 470, 772, 1807, 546, 340, 13, 628, 50256, 50256]
<|endoftext|>What are some cuss words in english?user: What are some cuss words in english? assistant: Here’s an incomplete list. Ass, dick, bugger, crap, fuck, shit, bitch, turd, shithead, shitbag, scrotum, cunt, whore, fucker, shit-eating, cum, cumbucket, fucknugget, butthole, poop, jackass, cocksucker, asshole, goddamn, piss, sperm, blow, wank, jism, cum-sucking, masturbate, faggot, queer, jizz, jizz-licking, prostitute, slut, cheater, fornicator, floozy, wetback, Mexican, Hispanic, sodomite, midget, mama’s boy, faggot, pervert, queer, scumbag, bitch, user: What's your favorite one? assistant: I haven't even thought about it. <|endoftext|><|endoftext|>
from trl.
Related Issues (20)
- KTOTrainer fails to compute loss when model is loaded across multiple GPUs HOT 4
- Support for training LLM's using RLAIF methods HOT 2
- SFT - Dataset Packing causes C10d timeout HOT 1
- TrlParser not respecting config
- Reproducing LLaVA with Mistral backened
- Bug in calling model.eval() in PPO
- vsft_llava.py:
- Fail to train with ORPOTrainer under multi-GPUs setting HOT 8
- NameError: name 'PeftConfig' is not defined HOT 4
- Loss value returned by the SFT Trainer
- Using PEFT causes model to not predict EOS HOT 2
- Using SFTTrainer and the DeepSpeed Zero-3 Method training, the data map function runs, but the map function does not run in parallel GPU form. HOT 1
- IPO loss computation vs changes in DPO dataset format
- Group Relative Policy Optimization Trainer
- bugs in trained save and evaluation for DPO training with deepspeed_zero3 HOT 3
- config.json file is missing. HOT 4
- The data is consuming too much GPU memory In the SFTTrainer HOT 3
- [Question] Does TRL support DPO trainer with simulated environment for generating training data like PPO's step-based training? HOT 1
- [Question]How should I combine DPOTrainer and Accelerate for training? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trl.