xiangli1999 / prefixtuning Goto Github PK
View Code? Open in Web Editor NEWPrefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Hi @XiangLi1999, thank you for open-sourcing this amazing work! I have been trying to understand your seq2seq implementation:
PrefixTuning/seq2seq/prefixTuning.py
Line 7 in 6519d30
I was wondering if you could help me with a few doubts that I had regarding the same:
mode_para
attribute mean? PrefixTuning/seq2seq/prefixTuning.py
Line 94 in 6519d30
use_encoder_prefix
attribute was only used in one of the prompt methods: def get_prompt_p5
PrefixTuning/seq2seq/prefixTuning.py
Line 436 in 6519d30
use_cross_prefix
attribute do?past_key_values
to feed the prefixes?past_key_values
fed to the model? As per my understanding, it should contain the key-value pairs for all the preceding tokens on the decoding side. How is the encoder side prefix included in the past_key_values
? https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartForConditionalGeneration.forward.past_key_valuesI just don't know how to inference with prefix tuning, what prefix should I choose for the model(like gpt2) ?
Hello! I'm trying to use PrefixTuning with T5 model. After reading source codes in seq2seq, I figure that generally speaking, prefix is added to the BART model by using the parameter past_key_values.
But in T5, when the parameter past_key_values is provided together with decoder_input_ids(the ground truth while training), the forward() function will only use the last token of decoder_input_ids
PrefixTuning/transformers/src/transformers/modeling_t5.py
Lines 1201 to 1206 in 6519d30
while the BART use the full decoder_input_ids.
PrefixTuning/transformers/src/transformers/modeling_bart.py
Lines 1448 to 1465 in 6519d30
However, I don't see any code handling this difference in seq2seq folder. The only codes I find about T5 are handing input ids or freezing embeddings.
Is PrefixTuning compatible with T5 model? If not, could you give some advice to make it so? Thanks a lot!
In the paper, there is GPT2-Large results on data2text task. I am wondering if it is possible to share the hyperparameter for GPT2-Large. By the way, really appreciate for the cleaned code!
Thank you for your open source code. I tried to run your program on the server, but the interface of pytorch_lightning has changed, so I got some errors. May I know the version of pytorch_lightning you and your team use? Thank you!
Looking forward to your reply.
Hi, I tried the seq2seq prefixtuning and found:
RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 15.90 GiB total capacity; 4.63 GiB already allocated; 797.50 MiB free; 5.81 GiB reserved in total by PyTorch)
I run the expr on a 16GB GPU. Am I supposed to use a 32GB GPU instead? Thanks!
Hi, thanks for your wonderful work!
But I have some questions about your open code. I saw that "--init_shallow_word" is used in gpt2 model(GPT2LMHeadModel), so the prev_key and prev_value can be initialized by some provided word like "summarize".
PrefixTuning/gpt2/train_e2e.py
Lines 34 to 35 in 0eb23e4
If I want to use this trick in seq2seq model(BartForConditionalGeneration), where and how to change your code?
I have found that directly using "get_gold_init" function didn't work.
PrefixTuning/gpt2/train_control.py
Lines 184 to 191 in 6519d30
It seems that BartModel forward function didn't return "past_key_values" because "use_cache" is set to False or the return format is different from GPT2LMHeadModel forward function. I didn't figure out this problem, and any reply would be helpful :) @XiangLi1999
PrefixTuning/transformers/src/transformers/modeling_bart.py
Lines 1242 to 1243 in 6519d30
I'm sorry to disturb you, but I have tried many ways to solve out it and failed. I have successfully trained a prefix tuning model(refer to this), but a RuntimeError occurs when decoding. The command shows in the following :
python gen.py webnlg yes valid /home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
And the Error:
python run_generation.py --model_type=gpt2 --length 100 --model_name_or_path=gpt2-medium --num_return_sequences 5 --stop_token [EOS] --tokenizer_name=/home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --task_mode=webnlg --control_mode=yes --tuning_mode prefixtune --gen_dir webNLG_results2 --eval_dataset valid --optim_prefix yes --preseqlen 20 --prefix_mode activation --format_mode cat --prefixModel_name_or_path /home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
09/21/2021 20:13:41 - WARNING - __main__ - device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
"_my_arg_task_mode": "webnlg",
"_my_arg_tune_mode": "prefixtune",
"_objective_mode": 2,
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1024,
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"n_positions": 1024,
"n_special": 0,
"predict_special_tokens": true,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"vocab_size": 50257
}
GPT2Config {
"_my_arg_control": true,
"_my_arg_task_mode": "webnlg",
"_my_arg_tune_mode": "prefixtune",
"activation_function": "gelu_new",
"architectures": [
"PrefixTuning"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"format_mode": "cat",
"init_random": "no",
"init_shallow": "no",
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"lowdata": false,
"mid_dim": 512,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1024,
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"n_positions": 1024,
"n_special": 0,
"optim_prefix": true,
"predict_special_tokens": true,
"prefix_dropout": 0.0,
"preseqlen": 5,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"train_weights": "no",
"use_infix": false,
"vocab_size": 50258
}
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
09/21/2021 20:14:05 - INFO - __main__ - Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='valid', format_mode='cat', fp16=False, gen_dir='webNLG_results2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='yes', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='webnlg', temperature=1.0, tokenizer_name='/home/yanzhongxiang/PrefixTuning/gpt2/webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
871 871
/home/yanzhongxiang/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam
/home/yanzhongxiang/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold
871
['Aarhus', ':', 'leaderName', ':', 'Jacob_Bundsgaard']
('leaderName',)
cat
True True
control code is None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
Traceback (most recent call last):
File "run_generation.py", line 1357, in <module>
main()
File "run_generation.py", line 1194, in main
num_return_sequences=1,
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/generation_utils.py", line 511, in generate
model_kwargs=model_kwargs,
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/generation_utils.py", line 700, in _generate_beam_search
outputs = self(**model_inputs, return_dict=True) # (batch_size * num_beams, cur_len, vocab_size)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
return_dict=return_dict,
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 645, in forward
output_attentions=output_attentions,
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 327, in forward
feed_forward_hidden_states = self.mlp(self.ln_2(hidden_states))
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 267, in forward
h = self.act(self.c_fc(x))
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_utils.py", line 1094, in forward
x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
If additional information was needed, please contact me. Thank you.
I noticed that the line of 572 in your seq2seq.finetune use the generate method which is implemented from bart model (huggingface). And the paramaters for generate method is input_ids. I think the self.model.generate only use the part of bartmodel which didn't contain the prompt part and the input of this method also didn't contain the prompt embeddin. And we know that the bartmodel parameters were not been updated during the prefix-tuning process.
Therefore, I am confused about how do you bring the prompt embedding to the evaluation process.
Hi, I'd like to try the infix mode as well. Could you also share it?
Thanks!
Very good work!
What I'd like to ask is, What is the setting of hyper-parameters in the low-resource scenario of summarization, such as learning rate and numbers of epoch. I have tried to use prefixtuning to low-resources summarization tasks,but it seems to work not very well...
Hi
I am getting this erorr, thanks for your help
Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement en-core-web-sm==2.3.1 (from -r /users/dara/seq2seq/temps/PrefixTuning/condaenv.2wv1bbj7.requirements.txt (line 26)) (from versions: none)
ERROR: No matching distribution found for en-core-web-sm==2.3.1 (from -r /users/dara/seq2seq/temps/PrefixTuning/condaenv.2wv1bbj7.requirements.txt (line 26))
failed
CondaEnvException: Pip failed
one thing which is unclear, for prefix fine tuning do we always need labelled dataset? Say we want to do Q/A on proprietary data, would we need q/a as x/y pairs?You use webnlg data for summarisation, which does not seem to have summaries defined. Would you have any views on the kind of dataset needed for fine tuning with this method?
Hi,
I calculated the number of parameters used in the embedding and linear layers of the prefix model from self.control_trans, self.control_trans_enc, self.control_trans2, wte, wte_enc, wte_2
and I am getting 62,1M. Since BART large is 406M, we should get 15% added parameters and not 2% like in table 2 of your paper.
I tried the following code:
sum(p.numel() for p in list(self.model.control_trans.parameters()))
which gives 20505376
or 20.5M using the hyparameters to replicate xsum results.
Here's the prefix model (Embedding is not included):
Sequential(
(0): Linear(in_features=1024, out_features=800, bias=True)
(1): Tanh()
(2): Linear(in_features=800, out_features=24576, bias=True)
)
There is such model for encoder inputs, decoder inputs and cross inputs. So, you have to multiply the 20.5M by 3 (see here: https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/prefixTuning.py#L260-L279).
Thanks
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1132, in _run
self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1432, in _call_setup_hook
self.call_hook("setup", stage=fn)
File "/home/ubuntu/anaconda3/envs/torch/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1483, in call_hook
output = model_fx(*args, **kwargs)
TypeError: setup() got an unexpected keyword argument 'stage'
Process finished with exit code 1
刚才看了看源码,太多了,太麻烦了。这个方法有没有在流行的代码库里面集成啊
With the latest version nothing works.
Additionally, no requirements.txt.
And installing the transformer package based on the folder in the repo fails to build tokenizers in mac.
Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.
In fact, I've gone through hard time re-implementing the original experiment of yours.
Until knowing that you've modified modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation()
(and maybe lil' modifications in generation_utils.py
) results were truly mysterious.
The function mentioned above is necessary for making this method actually work. It preserves past_key_values
passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.
It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. Meh😕.
Hope this helps the other visitors.
I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
Is it OK to just make position ids in [0, seq_len) ? Since I have not found the use of position embeddings for prefix matrix.
Hello,
You shared the xsum
dataset link here #2
However I see from the colab link https://worksheets.codalab.org/bundles/0x58f85171b43f4e61bf411c35faab369d
and from the hyperparameters/data directory in https://worksheets.codalab.org/bundles/0xa3f0cd3c10c7490ab508a351968cbdcf that you have used xsum_news
data. When I checked xsum_news
, I found that the validation file has 7,186
examples. However, the original dataset has 11,327
examples. The test set is also different with 11,333
examples in xsum_news
vs. 20,418
in the original xsum.
I was wondering if you could explain the differences in eval/test dataset sizes compared to the original and perhaps provide your script for preprocessing the original xsum.
Thanks!
is this the first one propose soft prompt
What version of PyTorch Lightning was this built with? I followed the setup instructions to install the requirements, but I keep get errors from misnamed parameters in the seq2seq module (the gpt-2 module works fine). I can fix the errors as they come up by consulting the current PyTorch Lightning documentation (filepath in the trace should be dirpath for example), but I'd rather use the code as written instead of manually updating it.
Traceback (most recent call last):
File "finetune.py", line 876, in
main(args)
File "finetune.py", line 782, in main
checkpoint_callback=get_checkpoint_callback(args.output_dir, model.val_metric, args.save_top_k, lower_is_better), #LISA
File "/workspace/PrefixTuning/seq2seq/callbacks.py", line 105, in get_checkpoint_callback
period=0, # maybe save a checkpoint every time val is run, not just end of epoch.
TypeError: init() got an unexpected keyword argument 'filepath'
Hi,
I got the following error which says python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory
when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache
Does anyone meet that issue or know how to deal with that? Thank you so much.
cat
True False
control code is None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
name : Zizzi | Type : pub | customer rating : average | near : Burger King <|endoftext|> Zizzi is a pub near Burger King. It has an average customer rating. <|endoftext|>
cat
True False
control code is None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
name : Zizzi | Type : pub | customer rating : high | near : Burger King <|endoftext|> Zizzi is a pub near Burger King with a high customer rating. <|endoftext|>
cat
True False
control code is None
beam
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence
=== GENERATED SEQUENCE 1 ===
name : Zizzi | Type : pub | near : The Sorrento <|endoftext|> Zizzi is a pub near The Sorrento. <|endoftext|>
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam_eval
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
python: can't open file '/u/scr/xlisali/e2e-metrics/measure_scores.py': [Errno 2] No such file or directory
Here are my environment configuration:
Package Version Location
---------------------- ----------- -------------------------------------------
absl-py 0.14.1
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
click 8.0.3
filelock 3.3.0
future 0.18.2
google-auth 1.35.0
google-auth-oauthlib 0.4.6
grpcio 1.41.0
idna 3.3
joblib 1.1.0
Markdown 3.3.4
nltk 3.6.5
numpy 1.21.2
oauthlib 3.1.1
packaging 21.0
Pillow 8.3.2
pip 20.0.2
pkg-resources 0.0.0
protobuf 3.18.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyparsing 2.4.7
pytorch-lightning 0.9.0
PyYAML 6.0
regex 2021.10.8
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
sacremoses 0.0.46
sentencepiece 0.1.96
setuptools 44.0.0
six 1.16.0
tensorboard 2.2.0
tensorboard-plugin-wit 1.8.0
tokenizers 0.8.1rc2
torch 1.8.0+cu111
torchvision 0.9.0+cu111
tqdm 4.62.3
transformers 3.2.0 /data/qbao775/PrefixTuning/transformers/src
typing-extensions 3.10.0.2
urllib3 1.26.7
Werkzeug 2.0.2
wheel 0.37.0
Which version of the Transformers did you modify? v3.2.0?
This is a relatively bad project because some strange packages were used
Hi, Lisa!
I read your paper and you have done brilliant work. I want to use GPT to fine-tune the DART dataset. However, I don't know how to evaluate my results. The official scripts (https://github.com/Yale-LILY) provide a different test set (5,097 samples), which has different references, too.
I use your test set (12,552 samples) to do generation and evaluate its performance based on the target sentences in the test set (12, 552 samples are aligned, so for each sample, I got only 1 reference). However, I can only get BLEU about 26.28 (GPT large), much lower than yours.
Could you please answer me how to evaluate it? Thank you!
Hi, thanks for the great repo! Would it be possible to add a proper license statement to it? Thank you!
Hello, I want to fine-tune the prefix along with the whole BART model.
And I comment the freeze code in seq2seq/finetune.py#L95
.
I don't know if it is right. (I see GPU usage getting bigger, that may be right)
However, when I load the model, I find only the prefix part is saved.
So, I want to know how to train, save and load the prefix and BART model.
Thank you very much!
Is the dataset used for the GPT-2-based training portion in the code located under 'data'? Where can I find the files that start with '/u/scr/xlisali/'?
Just for understanding the code. Thanks.
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101
would cause the error on my local PC below. I just did the environment set up and install nothing else. Should I install something instead?
Traceback (most recent call last):
File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 355, in get_config_dict
local_files_only=local_files_only,
File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 719, in cached_path
local_files_only=local_files_only,
File "/Users/.../PrefixTuning/transformers/src/transformers/file_utils.py", line 821, in get_from_cache
os.makedirs(cache_dir, exist_ok=True)
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 213, in makedirs
makedirs(head, exist_ok=exist_ok)
[Previous line repeated 4 more times]
File "/Users/.../opt/anaconda3/envs/sc/lib/python3.7/os.py", line 223, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/u'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run_language_modeling.py", line 1159, in
main()
File "run_language_modeling.py", line 546, in main
config = AutoConfig.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir)
File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_auto.py", line 310, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/Users/.../PrefixTuning/transformers/src/transformers/configuration_utils.py", line 368, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'gpt2-medium'. Make sure that:
'gpt2-medium' is a correct model identifier listed on 'https://huggingface.co/models'
or 'gpt2-medium' is the correct path to a directory containing a config.json file
Hi, Lisa!
I read in your paper that you have tried different inputs for prefix like 'random init', 'active', 'summarize', etc.
I would like to ask in what form did you apply words to the input to the prefix? Were these tokenized words or smth else?
Hi, thanks for sharing the codes.
I have tried the webnlg task and data2text task with the 'cleaned' branch. But I found that the "control_code" argument is not used in all the implementations of PrefixTuning.get_prompt(). Does this mean that different categories of webnlg dataset will use the same soft prompt? I found that there are get_prompt_p3, get_prompt_p1 and get_prompt_p4 in the master branch. Can I use them to reproduce the results of the paper?
Thanks.
Hi Lisa~ I rewrite the code refer to yours on BART based on the newest huggingface transformers, and I want to verify a thing that according to my training procedure, the speed of the prefix-training is about 60%~70% of the all parameter finetune, even when I used a very very small prefix prompt module. I want to ask for your help that: does that make sense? And where may be the bottle neck of the speed? Hope for you reply.
There are a few checkpoint_callback
being created in lighting_base.py
and I think that using the callback on line https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/lightning_base.py#L749 does allow us to save the model. I am rerunning the model right now to verify without the line. However, since it takes a long time to train, I was hoping that you can help me fix model saving.
Thanks
set pytorch PYTORCH_NO_CUDA_MEMORY_CACHING=1 to close the cache and resolve the OOM, but train slowly.
Is this normal?
Epoch 0: 0%| | 3/12784 [00:07<8:32:20, 2.41s/it, loss=7.35, v_num=5]/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
saving checkpoint now
saving models now/22..
try calling the pl_module save
Epoch 0: 0%| | 3/12784 [00:07<9:08:09, 2.57s/it, loss=7.35, v_num=5]
Traceback (most recent call last):
File "/data/lirongzhen/PrefixTuning/seq2seq/finetune.py", line 878, in
main(args)
File "/data/lirongzhen/PrefixTuning/seq2seq/finetune.py", line 779, in main
trainer: pl.Trainer = generic_train(
File "/data/lirongzhen/PrefixTuning/seq2seq/lightning_base.py", line 795, in generic_train
trainer.fit(model)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit
results = self.accelerator_backend.train()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 68, in train
results = self.train_or_test()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test
results = self.trainer.train()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 521, in train
self.train_loop.run_training_epoch()
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 560, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 687, in run_training_batch
self.training_step_and_backward(
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 816, in training_step_and_backward
self.backward(result, optimizer, opt_idx)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 836, in backward
result.closure_loss = self.trainer.accelerator_backend.backward(
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 98, in backward
closure_loss = self.trainer.precision_connector.backend.backward(
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/plugins/native_amp.py", line 46, in backward
model.backward(closure_loss, optimizer, opt_idx)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1152, in backward
loss.backward(*args, **kwargs)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 47.54 GiB total capacity; 41.43 GiB already allocated; 900.75 MiB free; 44.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Hi,
I got the following error which says FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold'
when I run that command CUDA_VISIBLE_DEVICES=0 python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00008 --mode data2text --bsz 10 --seed 101 --tuning_mode prefixtune --cache_dir ./cache
Does anyone meet that issue or know how to deal with that? Thank you so much.
Training completed. Do not forget to share your model on huggingface.co/models =)
10/15/2021 20:14:10 - INFO - trainer_prefix - Saving model checkpoint to save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
10/15/2021 20:14:11 - INFO - __main__ - *** Evaluate ***
10/15/2021 20:14:11 - INFO - trainer_prefix - ***** Running Evaluation *****
10/15/2021 20:14:11 - INFO - trainer_prefix - Num examples = 42061
10/15/2021 20:14:11 - INFO - trainer_prefix - Batch size = 10
False
False
{'eval_loss': 25.165123616772462, 'epoch': 5.0, 'total_flos': 2514722051589120, 'step': 21035}
10/15/2021 20:18:41 - INFO - __main__ - ***** Eval results *****
10/15/2021 20:18:41 - INFO - __main__ - perplexity = 25.165123616772462
running evaluation on /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
suggested code:
python gen.py data2text yes valid /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
python gen.py data2text yes test /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 no
python run_generation.py --model_type=gpt2 --length 100 --model_name_or_path=gpt2-medium --num_return_sequences 5 --stop_token [EOS] --tokenizer_name=/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --task_mode=data2text --control_mode=yes --tuning_mode prefixtune --gen_dir e2e_results_conv2 --eval_dataset valid --optim_prefix no --preseqlen 20 --prefix_mode activation --format_mode cat --prefixModel_name_or_path /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
10/15/2021 20:18:42 - WARNING - __main__ - device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
"_my_arg_task_mode": "data2text",
"_my_arg_tune_mode": "prefixtune",
"_objective_mode": 2,
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1024,
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"n_positions": 1024,
"n_special": 0,
"predict_special_tokens": true,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"vocab_size": 50257
}
GPT2Config {
"_my_arg_control": true,
"_my_arg_task_mode": "data2text",
"_my_arg_tune_mode": "prefixtune",
"activation_function": "gelu_new",
"architectures": [
"PrefixTuning"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"format_mode": "cat",
"init_random": "no",
"init_shallow": "no",
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"lowdata": false,
"mid_dim": 512,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1024,
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"n_positions": 1024,
"n_special": 0,
"optim_prefix": true,
"predict_special_tokens": true,
"prefix_dropout": 0.0,
"preseqlen": 5,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"train_weights": "no",
"use_infix": false,
"vocab_size": 50258
}
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
10/15/2021 20:19:00 - INFO - __main__ - Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='valid', format_mode='cat', fp16=False, gen_dir='e2e_results_conv2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='no', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='data2text', temperature=1.0, tokenizer_name='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
using the test path /data/qbao775/PrefixTuning/data/e2e_data/src1_valid.txt
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold
547
Traceback (most recent call last):
File "run_generation.py", line 1356, in <module>
main()
File "run_generation.py", line 825, in main
write_e2e_corr(prompt_text_lst, prompt_text_dict, gold_dir)
File "run_generation.py", line 360, in write_e2e_corr
with open(corr_path, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_gold'
python run_generation.py --model_type=gpt2 --length 100 --model_name_or_path=gpt2-medium --num_return_sequences 5 --stop_token [EOS] --tokenizer_name=/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --task_mode=data2text --control_mode=yes --tuning_mode prefixtune --gen_dir e2e_results_conv2 --eval_dataset test --optim_prefix no --preseqlen 20 --prefix_mode activation --format_mode cat --prefixModel_name_or_path /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --cache_dir ./cache/gpt2-medium-s3
10/15/2021 20:19:02 - WARNING - __main__ - device: cuda, n_gpu: 1, 16-bits training: False
loading from PrefixTuning. /data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
loading the trained tokenizer
Using pad_token, but it is not set yet.
50257 <|endoftext|> <|endoftext|> None
50256
<|endoftext|>
None
<|endoftext|> 50256
50257 <|endoftext|> <|endoftext|> <|endoftext|>
GPT2Config {
"_my_arg_task_mode": "data2text",
"_my_arg_tune_mode": "prefixtune",
"_objective_mode": 2,
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1024,
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"n_positions": 1024,
"n_special": 0,
"predict_special_tokens": true,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"vocab_size": 50257
}
GPT2Config {
"_my_arg_control": true,
"_my_arg_task_mode": "data2text",
"_my_arg_tune_mode": "prefixtune",
"activation_function": "gelu_new",
"architectures": [
"PrefixTuning"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"format_mode": "cat",
"init_random": "no",
"init_shallow": "no",
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"lowdata": false,
"mid_dim": 512,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 1024,
"n_head": 16,
"n_inner": null,
"n_layer": 24,
"n_positions": 1024,
"n_special": 0,
"optim_prefix": true,
"predict_special_tokens": true,
"prefix_dropout": 0.0,
"preseqlen": 5,
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"train_weights": "no",
"use_infix": false,
"vocab_size": 50258
}
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
10/15/2021 20:19:20 - INFO - __main__ - Namespace(cache_dir='./cache/gpt2-medium-s3', control_dataless='no', control_mode='yes', device=device(type='cuda'), eval_dataset='test', format_mode='cat', fp16=False, gen_dir='e2e_results_conv2', k=0, length=100, model_name_or_path='gpt2-medium', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=5, objective_mode=2, optim_prefix='no', p=0.9, padding_text='', prefix='', prefixModel_name_or_path='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', prefix_mode='activation', preseqlen=20, prompt='', repetition_penalty=1.0, seed=42, stop_token='[EOS]', task_mode='data2text', temperature=1.0, tokenizer_name='/data/qbao775/PrefixTuning/gpt2/save_e2e_models_convcheck/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', tuning_mode='prefixtune', xlm_language='')
using the test path /data/qbao775/PrefixTuning/data/e2e_data/src1_test.txt
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_beam
/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold
630
Traceback (most recent call last):
File "run_generation.py", line 1356, in <module>
main()
File "run_generation.py", line 825, in main
write_e2e_corr(prompt_text_lst, prompt_text_dict, gold_dir)
File "run_generation.py", line 360, in write_e2e_corr
with open(corr_path, 'w') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/qbao775/PrefixTuning/gpt2/e2e_results_conv2/data2textprefixtune_y_5_act_cat_b=10-e=5_d=0.0_u=no_lr=8e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_test_gold'
Here are my environment configuration:
Package Version Location
---------------------- ----------- -------------------------------------------
absl-py 0.14.1
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
click 8.0.3
filelock 3.3.0
future 0.18.2
google-auth 1.35.0
google-auth-oauthlib 0.4.6
grpcio 1.41.0
idna 3.3
joblib 1.1.0
Markdown 3.3.4
nltk 3.6.5
numpy 1.21.2
oauthlib 3.1.1
packaging 21.0
Pillow 8.3.2
pip 20.0.2
pkg-resources 0.0.0
protobuf 3.18.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyparsing 2.4.7
pytorch-lightning 0.9.0
PyYAML 6.0
regex 2021.10.8
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
sacremoses 0.0.46
sentencepiece 0.1.96
setuptools 44.0.0
six 1.16.0
tensorboard 2.2.0
tensorboard-plugin-wit 1.8.0
tokenizers 0.8.1rc2
torch 1.8.0+cu111
torchvision 0.9.0+cu111
tqdm 4.62.3
transformers 3.2.0 /data/qbao775/PrefixTuning/transformers/src
typing-extensions 3.10.0.2
urllib3 1.26.7
Werkzeug 2.0.2
wheel 0.37.0
Hi, thanks for the great work!
In section 7.4, it conducts an initialization experiment with real words. I am just wondering, does this initialization applies to prompts in every layer? Or just the prompts in the first layer? And how does this work together with the re-parameterization method since the input dimension of re-param is much smaller?
And I also noticed that in your code, instead of directly adding prompts to the input of each layer (as described in ur paper), what u actually did is appending vectors to key value matrices directly via the past_key_values
argument. Just wondering, how does the initialization experiment work in this setup/implementation? Directly initialize the key/value vectors? But seems that the dimension doesn't match?
Thanks!
Hi,
Thanks for your contribution.
Could you share your evaluation scripts on WebNLG dataset? (e.g. run_eval_on_webnlg.sh)
Hi, Lisali. I failed to use GPT-2, which trained in prefix tuning mode on webnlg mode, to evaluate valid dataset. I use the scripts you provide here. The detailed information shows in the belowing.
Traceback (most recent call last): File "evaluation.py", line 356, in <module> read_participant(team, sys.argv[1]) File "evaluation.py", line 250, in read_participant output_reduced = [output[i-1] for i in sorted(entry_ids)] File "evaluation.py", line 250, in <listcomp> output_reduced = [output[i-1] for i in sorted(entry_ids)] IndexError: list index out of range
the command is bash evaluation/run_eval_on_webnlg.sh ~/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam >> ~/PrefixTuning/transformers/examples/text-generation/webNLG_results2/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1_valid_beam_eval
I have observed "sorted(entry_ids)" starts with element 0, and it's length can not match output. I read the code and find output is just the file that I provide in the command line and assure it's justification. Do I miss something? Thank you very much.
Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?
Here is the evironments:
certifi (2021.5.30)
charset-normalizer (2.0.4)
click (8.0.1)
dataclasses (0.8)
filelock (3.0.12)
idna (3.2)
importlib-metadata (4.8.1)
itsdangerous (2.0.1)
Jinja2 (3.0.1)
joblib (1.0.1)
MarkupSafe (2.0.1)
nltk (3.6.2)
numpy (1.19.5)
packaging (21.0)
Pillow (8.3.2)
pip (9.0.3)
pyparsing (2.4.7)
Python-dev (2.0.0.dev0)
regex (2021.8.28)
requests (2.26.0)
sacremoses (0.0.45)
sentencepiece (0.1.96)
setuptools (39.2.0)
six (1.16.0)
tokenizers (0.8.1rc2)
torch (1.8.0+cu111)
torchvision (0.9.0+cu111)
tqdm (4.62.2)
transformers (3.2.0, /home/yanzhongxiang/PrefixTuning/transformers/src)
typing-extensions (3.10.0.2)
urllib3 (1.26.6)
Werkzeug (2.0.1)
zipp (3.5.0)
Here is the command line:
python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache
Here is the error information:
webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1
python run_language_modeling.py --output_dir=webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --model_type=gpt2 --model_name_or_path=gpt2-medium --tokenizer_name=gpt2-medium --per_device_train_batch_size 5 --per_device_eval_batch_size 5 --save_steps 500000 --num_train_epochs 5 --do_train --train_data_file=../data/webnlg_challenge_2017/train.json --do_eval --line_by_line --save_total_limit 1 --overwrite_output_dir --task_mode webnlg --eval_data_file=../data/webnlg_challenge_2017/dev.json --tuning_mode prefixtune --logging_dir webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1 --train_embs no --optim_prefix yes --preseqlen 5 --prefix_mode activation --format_mode cat --gradient_accumulation_steps 1 --learning_rate 5e-05 --weight_decay 0.0 --seed 101 --disable_tqdm --mid_dim 512 --init_random no --use_dropout no --prefix_dropout 0.0 --objective_mode 1 --evaluate_during_training --eval_steps 5000 --cache_dir cache/gpt2-medium-s3
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/__init__.py
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/training_args.py:299: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
FutureWarning,
09/16/2021 10:22:04 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
09/16/2021 10:22:04 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir='webnlg_models/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=True, evaluation_strategy=<EvaluationStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=5, per_device_eval_batch_size=5, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, warmup_steps=0, logging_dir='webnlg_models/runs/webnlgprefixtune_y_5_act_cat_b=5-e=5_d=0.0_u=no_lr=5e-05_w=0.0_s=101_r=n_m=512_o=1_o=1', logging_first_step=False, logging_steps=500, save_steps=500000, save_total_limit=1, no_cuda=False, seed=101, fp16=False, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=5000, dataloader_num_workers=0, past_index=-1, run_name=None, disable_tqdm=True, remove_unused_columns=True, label_names=None)
objective is 1
False
/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/tokenization_utils_base.py:1324: FutureWarning: The `max_len` attribute has been deprecated and will be removed in a future version, use `model_max_length` instead.
FutureWarning,
prefixtune
adapting the size of the model embedding to include [PAD]
len(tokenizer) = 50257
len(tokenizer) = 50258
<|endoftext|> 50256
<|endoftext|> 50256
loading the prefix model from None
training the prefix model from scratch.
under the PrefixTuning model
PrefixTuning
preseqlen is 5, optimizing the prefix directly
[Full prefix-tuning Setting :) ]
torch.Size([5, 1024])
torch.Size([512, 1024])
torch.Size([512])
torch.Size([49152, 512])
torch.Size([49152])
total param is 25744896
webnlg
tgt_avg: 30.665242718446603
src_avg: 49.62568654646324
ratios: 1.6183040519881826
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
| Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> The Aarhus is the airport of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 383, 317, 283, 7537, 318, 262, 9003, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220, 50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
| Aarhus_Airport : cityServed : "Aarhus, Denmark" <|endoftext|> Aarhus Airport serves the city of Aarhus, Denmark. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 1748, 50, 8520, 1058, 366, 32, 283, 7537, 11, 16490, 1, 220]
[50256, 317, 283, 7537, 12690, 9179, 262, 1748, 286, 317, 283, 7537, 11, 16490, 13, 220, 50256]
[1748, 50, 8520]
webnlg
tgt_avg: 31.644375553587246
src_avg: 51.023914968999115
ratios: 1.6124165535386898
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220, 50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
| Aarhus : leaderName : Jacob_Bundsgaard <|endoftext|> The leader of Aarhus is Jacob Bundsgaard. <|endoftext|>
[220, 930, 317, 283, 7537, 1058, 3554, 5376, 1058, 12806, 62, 33, 917, 82, 36232, 220]
[50256, 383, 3554, 286, 317, 283, 7537, 318, 12806, 13319, 82, 36232, 13, 220, 50256]
[3554, 5376]
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220, 50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
| Aarhus_Airport : runwayLength : 2702.0 <|endoftext|> Aarhus Airport's runway length is 2702.0. <|endoftext|>
[220, 930, 317, 283, 7537, 62, 16170, 634, 1058, 23443, 24539, 1058, 20479, 17, 13, 15, 220]
[50256, 317, 283, 7537, 12690, 338, 23443, 4129, 318, 20479, 17, 13, 15, 13, 220, 50256]
[23443, 24539]
FORMAT MODE IS cat
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:309: FutureWarning: Passing `prediction_loss_only` as a keyword argument is deprecated and won't be possible in a future version. Use `args.prediction_loss_only` instead.
FutureWarning,
09/16/2021 10:22:53 - WARNING - trainer_prefix - You are instantiating a Trainer but Tensorboard is not installed. You should consider installing it.
/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py:1291: FutureWarning: This method is deprecated, use `Trainer.is_world_process_zero()` instead.
warnings.warn("This method is deprecated, use `Trainer.is_world_process_zero()` instead.", FutureWarning)
{'state': {}, 'param_groups': [{'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [0, 1, 2]}, {'weight_decay': 0.0, 'lr': 5e-05, 'betas': (0.9, 0.999), 'eps': 1e-08, 'correct_bias': True, 'params': [3, 4]}]}
09/16/2021 10:22:53 - INFO - trainer_prefix - ***** Running training *****
09/16/2021 10:22:53 - INFO - trainer_prefix - Num examples = 18025
09/16/2021 10:22:53 - INFO - trainer_prefix - Num Epochs = 5
09/16/2021 10:22:53 - INFO - trainer_prefix - Instantaneous batch size per device = 5
09/16/2021 10:22:53 - INFO - trainer_prefix - Total train batch size (w. parallel, distributed & accumulation) = 40
09/16/2021 10:22:53 - INFO - trainer_prefix - Gradient Accumulation steps = 1
09/16/2021 10:22:53 - INFO - trainer_prefix - Total optimization steps = 2255
Traceback (most recent call last):
File "run_language_modeling.py", line 1159, in <module>
main()
File "run_language_modeling.py", line 993, in main
trainer.train(model_path=model_path)
File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 811, in train
tr_loss += self.training_step(model, inputs)
File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1174, in training_step
loss = self.compute_loss(model, inputs, gpt2_model=self.gpt2)
File "/home/yanzhongxiang/PrefixTuning/gpt2/trainer_prefix.py", line 1214, in compute_loss
outputs = model(**inputs, gpt2_model=gpt2_model)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/gpt2/train_control.py", line 327, in forward
return_dict=return_dict, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 951, in forward
return_dict=return_dict,
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/src/transformers/modeling_gpt2.py", line 619, in forward
inputs_embeds = self.wte(input_ids)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/modules/sparse.py", line 147, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/yanzhongxiang/PrefixTuning/transformers/venv/lib64/python3.6/site-packages/torch/nn/functional.py", line 1913, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Input, output and indices must be on the current device
Hi Lisa~ I've been recently following your work on NLG task. And I'm wondering that could you provide the version and access for the data you used(for example e2e, webnlg)? since your code use some paths start with the absolute path on you disk, and the code for reading "e2e dataset" data split each line in it by "||". So it confused me to wonder whether I used the same one as yours. Thx(and forgive me if i missed something important)!
Hi,
I'm doing an experiment on this package for ‘classify-sentiment’ tasks. I'm using the code in the folder ‘gpt2’. I modified 'prediction_loss_only = False' in 'run_language_modeling.py' in order to get the predicted labels for the testing dataset.
Unfortunately, I get an error saying that 'AttributeError: 'tuple' object has no attribute 'detach''. The training and evaluation process are fine if 'prediction_loss_only = True'. However, I can only get the perplexity in this case.
Traceback (most recent call last):
File "run_language_modeling.py", line 1173, in <module>
main()
File "run_language_modeling.py",line 993,in main
trainer.train(model_path=model_path)
File"/work/PrefixTuning/gpt2/trainer_prefix.py",line 867, in train
metrics =self.evaluate()
File"/work/PrefixTuning/gpt2/trainer_prefixpy",line 1419,in evaluate
output =self.prediction_loop(eval_dataloader,description="Evaluation")
File"/work/PrefixTuning/gpt2/trainer_prefix.py", line 1514,in prediction_loop
loss,logits,labels =self.prediction_step(model,inputs,prediction_loss_only
File"/work/PrefixTuning/gpt2/trainer_prefix.py",line 1625,in prediction_step
logits = tuple(logit.detach() for logit in logits)
File"/work/PrefixTuning/gpt2/trainer_prefix.py",line 1625,in <genexpr>
logits = tuple(logit.detach() for logit in logits)
AttributeError: 'tuple' object has no attribute 'detach
Does anyone know why it happens? Is there any workaround for it? Thanks!
p.s.
I have tried a few suggestions like:
huggingface/transformers#7760
and
huggingface/transformers#7539
But I got no luck.
Hi @XiangLi1999
Thanks for releasing your code. I was wondering how it is possible to download the "webnlg" dataset? I was not able to find any .json format of webnlg dataset. Could you please share your data version as well?
Best,
Mohammad
Thanks for making your code public. I am having difficulties replicating XSUM results. I think that it could be a data processing issue. Can you provide more details on XSUM data and preprocessing.
I got the data from here:
http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz
Then, I tried to preprocess it myself since you use a .len
file in https://github.com/XiangLi1999/PrefixTuning/blob/cleaned/seq2seq/utils.py#L104-L106
self.src_file = Path(data_dir).joinpath(type_path + ".source")
self.tgt_file = Path(data_dir).joinpath(type_path + ".target")
self.len_file = Path(data_dir).joinpath(type_path + ".len")
Still, the results are significantly lower than in the paper.
Thanks
Hi Lisa,
I saw your video and have read your paper. Great work.
I want to try prefix-tuning GPT2 for code summarization task and want to bring my data in right format which can be fed to the code as an input. My data has has pairs of code snippet and the corresponding summary. Can you please guide me to bring it to a right format.
Thank you,
Regards,
Manasi
Hi, thanks for the great work!
I noticed one notation typo in the paper. It's in footnote 4 of page 5. The second P_theta should be P_theta'.
Hope this could help.
Hi Lisa,
Thanks a lot for conducting such helpful experiments on comparing prompt-based fine-tuning with prompt tuning. However, I have difficulty in reproducing your BLEU and ROUGE-L score shown in Figure 3 below. Could you please give me some clues on how to calculate BLEU in your code in lowdata setting? (I only saw perplexity)
I also re-implement your prefix prompt with activation using GPT-2(117M). However, it only achieves 15 BLEU score on WikiData. Could you please give me some insight in the lowdata prefix-tuning setting?
Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.