Comments (6)
I think 60%-70% makes sense!
Great question: the speed gains in prefix-tuning happens because you don't have to update as many parameters that's stored in the optimizer (aka fewer trainable parameters), but backprop is still required all the way to the bottom Transformer layer. One thought experiment that could explain this is as follows: imagine when you only train the last one layer of a Transformer model, then both number of trainable parameter and the required number of backprop layer reduced (you only need to backprop one layer, since you are not interested in the gradients of first couple layers). However, if you only train the first layer of the Transformer, then you need backprop all the way, despite the same number of trainable parameters.
Based on the first layer v.s. last layer analogy, let's go back to prefix-tuning. We tune all activation layers, and therefore we need to backprop all the way back to the first layer, so backprop time is not reduced. The only reduced computation is that we don't need to do as much updates.
Let me know if this makes sense.
from prefixtuning.
Great thank for your analysis! I assume for the same reasons too233. thx again!
from prefixtuning.
What's your GPU hardware environment, a piece of gpu can train? thx~@tianbao Xie
from prefixtuning.
Of course, it depends on the model, I think 11GB memory is enough for e2e dataset in GPT2.
from prefixtuning.
When trying to train in GPT2, the bellow problem trouble me. Can you help me to fix it?thx~
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101
webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
python run_language_modeling.py --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --model_type=gpt2 --model_name_or_path=gpt2-medium --tokenizer_name=gpt2-medium --per_device_train_batch_size 5 --per_device_eval_batch_size 5 --save_steps 500000 --num_train_epochs 5 --do_train --train_data_file=/u/scr/xlisali/WebNLG/webnlg-dataset/webnlg_challenge_2017/train.json --do_eval --line_by_line --save_total_limit 1 --overwrite_output_dir --task_mode webnlg --eval_data_file=/u/scr/xlisali/WebNLG/webnlg-dataset/webnlg_challenge_2017/dev.json --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000 --cache_dir /u/scr/xlisali/contrast_LM/transformers/examples/control/gpt2-medium-s3
/data/lirongzhen/PrefixTuning/transformers/src/transformers/init.py
Traceback (most recent call last):
File "/data/lirongzhen/PrefixTuning/gpt2/run_language_modeling.py", line 1159, in
main()
File "/data/lirongzhen/PrefixTuning/gpt2/run_language_modeling.py", line 498, in main
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
File "/data/lirongzhen/PrefixTuning/transformers/src/transformers/hf_argparser.py", line 40, in init
self._add_dataclass_arguments(dtype)
File "/data/lirongzhen/PrefixTuning/transformers/src/transformers/hf_argparser.py", line 72, in _add_dataclass_arguments
elif hasattr(field.type, "origin") and issubclass(field.type.origin, List):
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/typing.py", line 847, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class
from prefixtuning.
Sorry for forgetting to close this issue, thanks again!
from prefixtuning.
Related Issues (20)
- python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory HOT 3
- Why DeepCopy automatically remove the token id before <endoftext> token? HOT 1
- GPT 2 prefix tuning. Input data format. HOT 1
- A piece of GPU can train this model? HOT 4
- TypeError: setup() got an unexpected keyword argument 'stage' HOT 1
- OSError: [Errno 30] Read-only file system: '/u' HOT 1
- About the results on lowdata table2text task.
- The version of pytorch_lightning HOT 4
- About the evaluation scripts HOT 4
- Understanding the Seq2Seq Encoder-Decoder Prefix Implementation HOT 11
- Use --init_shallow_word for seq2seq model HOT 1
- Hi, What do function names get_prompt_p3,5,7 ... mean?
- AttributeError: 'tuple' object has no attribute 'detach'
- Should've mentioned about "CRITICAL" modifications done in transformers source code
- notation typo in the paper
- question about the initialization experiment
- Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ? HOT 2
- control code is not used in PrefixTuning.get_prompt() HOT 3
- Data preparation step HOT 1
- Which version of the Transformers did you modify? v3.2.0? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prefixtuning.