r-three / t-few Goto Github PK
View Code? Open in Web Editor NEWCode for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"
License: MIT License
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"
License: MIT License
Hi, thanks for the code!
When I run:
CUDA_VISIBLE_DEVICES=0 python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp3
I get:
Reusing dataset super_glue (/localdata/hjl/hf/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Missing logger folder: /home/hjl/t-few/exp_out/first_exp3/log
| Name | Type | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B
-----------------------------------------------------
2.8 B Trainable params
0 Non-trainable params
2.8 B Total params
11,399.029Total estimated model params size (MB)
Validation sanity check: 0%| | 0/18 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hjl/t-few/src/pl_train.py", line 86, in <module>
main(config)
File "/home/hjl/t-few/src/pl_train.py", line 57, in main
trainer.fit(model, datamodule)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 229, in validation_step
batch_output = self.predict(batch)
File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 139, in predict
if not self.config.split_option_flag:
AttributeError: 'Config' object has no attribute 'split_option_flag'
I can't find a reference to split_option_flag
in any of the config files.
Should I manually set it?
Thanks!
In the paper, you mention that IA^3 is compatible with multi-task batching, a requirement to be comparable to ICL. Unfortunately, the current implementation of Huggingface PEFT does not support this, and it would apparently be a big refactoring to do so huggingface/peft#759.
Do you know of an implementation or example that shows how to do this?
Hi,
I'm trying to implement your method (IA)3 for use with HuggingFace's PEFT library and had a question. In the paper, it is mentioned that the learned vectors in (IA)3 are added for all the position-wise feedfoward layers in the transformer, along with the various attention layers. I ran src/models/lora.py
and used the config parameters in configs/ia3.json
to check how the new model layers would be. The typical FeedForward module in T5 is a T5DenseActDense
module that looks as follows:
(DenseReluDense): T5DenseActDense(
(wi): Linear(in_features=768, out_features=3072, bias=False)
(wo): Linear(in_features=3072, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): ReLU()
)
Since (IA)3 is implemented as an extension of LoRA, the Linear
layers are supposed to get converted into LoRALinear
layers. However, the config in ia3.json
sets the parameter lora_layers
to be "k|v|wi_1.*", which does not include the layers in DenseReluDense (These are attributes with names wi
and wo
). I've tried T5-small, T5-base and T5-3b, and for all of these models, learned vectors are not added for the feedforward layers. I was wondering if I'm doing something wrong or if I'm supposed to use a different config file. Or are (IA)3 parameters added only for certain feedforward layers?
Hi, thanks for your excellent job. I've read the paper the reviewed the code. I've encounter some issues as outlined following:
I'd appreciate a detailed explanation of how the "rank classification" is implemented. Could you please provide clarification on the code found at this link?
I'm curious about how the "rank classification" process influences the final results. Is it feasible to employ a direct generation approach, such as generating the label words and matching them against the true answer, as an alternative method?
Hello, Thanks for your great contribution! I am trying to install this in the Windows.
In the tfew environment of Anaconda, due to the deepspeed==0.5.10 hard to install in the windows, i use the ==0.3.16, and installed successfully.
However, when I run my first experiment as the following code, the error is get, can you help meοΌ
(tfew) C:\Users\78166\t-few>set HF_HOME=%USERPROFILE%.cache\huggingface
(tfew) C:\Users\78166\t-few>set CUDA_VISIBLE_DEVICES=0
(tfew) C:\Users\78166\t-few>python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp
Start experiment first_exp
{
"exp_dir": "exp_out\first_exp",
"exp_name": "first_exp",
"allow_skip_exp": true,
"seed": 42,
"model": "EncDec",
"max_seq_len": 256,
"origin_model": "bigscience/T0_3B",
"load_weight": "",
"dataset": "rte",
"few_shot": true,
"num_shot": 32,
"few_shot_random_seed": 100,
"train_template_idx": -1,
"eval_template_idx": -1,
"batch_size": 8,
"eval_batch_size": 16,
"num_workers": 8,
"change_hswag_templates": false,
"raft_cross_validation": true,
"raft_validation_start": 0,
"raft_labels_in_input_string": "comma",
"cleaned_answer_choices_b77": false,
"compute_precision": "bf16",
"compute_strategy": "none",
"num_steps": 300,
"eval_epoch_interval": 10000,
"eval_before_training": true,
"save_model": false,
"save_step_interval": 20000,
"mc_loss": 1,
"unlikely_loss": 1,
"length_norm": 1,
"grad_accum_factor": 1,
"split_option_at_inference": false,
"optimizer": "adafactor",
"lr": 0.0003,
"trainable_param_names": ".",
"scheduler": "linear_decay_with_warmup",
"warmup_ratio": 0.06,
"weight_decay": 0,
"scale_parameter": true,
"grad_clip_norm": 1,
"model_modifier": "",
"prompt_tuning_num_prefix_emb": 100,
"prompt_tuning_encoder": true,
"prompt_tuning_decoder": true,
"lora_rank": 4,
"lora_scaling_rank": 0,
"lora_init_scale": 0.01,
"lora_modules": "none",
"lora_layers": "none",
"bitfit_modules": ".",
"bitfit_layers": "q|k|v|o|wi_[01]|w_o",
"adapter_type": "normal",
"adapter_non_linearity": "relu",
"adapter_reduction_factor": 4,
"normal_adapter_residual": true,
"lowrank_adapter_w_init": "glorot-uniform",
"lowrank_adapter_rank": 1,
"compacter_hypercomplex_division": 8,
"compacter_learn_phm": true,
"compacter_hypercomplex_nonlinearity": "glorot-uniform",
"compacter_shared_phm_rule": false,
"compacter_factorized_phm": false,
"compacter_shared_W_phm": false,
"compacter_factorized_phm_rule": false,
"compacter_phm_c_init": "normal",
"compacter_phm_rank": 1,
"compacter_phm_init_range": 0.01,
"compacter_kronecker_prod": false,
"compacter_add_compacter_in_self_attention": false,
"compacter_add_compacter_in_cross_attention": false,
"intrinsic_projection": "fastfood",
"intrinsic_said": true,
"intrinsic_dim": 2000,
"intrinsic_device": "cpu",
"fishmask_mode": null,
"fishmask_path": null,
"fishmask_keep_ratio": 0.05,
"prefix_tuning_num_input_tokens": 10,
"prefix_tuning_num_target_tokens": 10,
"prefix_tuning_init_path": null,
"prefix_tuning_init_text": null,
"prefix_tuning_parameterization": "mlp-512",
"train_pred_file": "exp_out\first_exp\train_pred.txt",
"dev_pred_file": "exp_out\first_exp\dev_pred.txt",
"dev_score_file": "exp_out\first_exp\dev_scores.json",
"test_pred_file": "exp_out\first_exp\test_pred.txt",
"test_score_file": "exp_out\first_exp\test_scores.json",
"finish_flag_file": "exp_out\first_exp\exp_completed.txt"
}
Mark experiment first_exp as claimed
Traceback (most recent call last):
File "C:\Users\78166\anaconda3\envs\tfew\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\78166\anaconda3\envs\tfew\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\78166\t-few\src\pl_train.py", line 86, in
main(config)
File "C:\Users\78166\t-few\src\pl_train.py", line 33, in main
tokenizer, model = get_transformer(config)
File "C:\Users\78166\t-few\src\pl_train.py", line 17, in get_transformer
tokenizer = AutoTokenizer.from_pretrained(config.origin_model)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 481, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 350, in get_tokenizer_config
use_auth_token=use_auth_token,
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\transformers\file_utils.py", line 1784, in cached_path
local_files_only=local_files_only,
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\transformers\file_utils.py", line 1947, in get_from_cache
r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\requests\api.py", line 100, in head
return request("head", url, **kwargs)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\requests\adapters.py", line 497, in send
chunked=chunked,
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\urllib3\connectionpool.py", line 696, in urlopen
self._prepare_proxy(conn)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\urllib3\connectionpool.py", line 964, in _prepare_proxy
conn.connect()
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\urllib3\connection.py", line 359, in connect
conn = self._connect_tls_proxy(hostname, conn)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\urllib3\connection.py", line 506, in connect_tls_proxy
ssl_context=ssl_context,
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\urllib3\util\ssl.py", line 453, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls)
File "C:\Users\78166\anaconda3\envs\tfew\lib\site-packages\urllib3\util\ssl.py", line 495, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock)
File "C:\Users\78166\anaconda3\envs\tfew\lib\ssl.py", line 412, in wrap_socket
session=session
File "C:\Users\78166\anaconda3\envs\tfew\lib\ssl.py", line 807, in _create
raise ValueError("check_hostname requires server_hostname")
ValueError: check_hostname requires server_hostname
What is the meaning of score_gt and score_cand? How do I better run the model by observing these parameters?
Some context: in line 179 of the code, we have param.requires_grad_(False). I'm a bit confused why this needs to be set to false. When I try to reproduce this code in a different setting, my loss does not decrease. However, when param.requires_grad_(True), the loss does decrease. Either way, I'm unclear why it should matter because in the optimizer only intrinsic_parameter and intrinsic_said are being updated.
Hi @craffel @muqeeth @HaokunLiu,
We're trying to reproduce T-Few
results for a paper, but we're getting 'CUDA out of memory' using an A100 with 80GB (your recommended setup).
This is what we're running:
python -m src.pl_train -c t011b.json+ia3.json+rte.json -k load_weight="pretrained_checkpoints/t011b_ia3_finish.pt" exp_name=t011b_rte_seed42_ia3_pretrained few_shot_random_seed=42 seed=42
We installed according to the README instructions and are using the default settings in the config files.
We are able to run the 3 billion model using the command above, just not the 11 billion.
Is there anything we are doing wrong?
This is the exception:
Thank you
Hi everyone!
When I run the experiments after eval_epoch_interval
's the model is validated and a checkpoint is written out as global_stepXXXXX.pt
. At the end there is also a final checkpoint written out named finish.pt
. I assumed this one either belongs to the best intermediate validation performance or the last epoch. However, from comparing it with the other checkpoints that were created it seems that finish.pt
differs from all global_stepXXXXX.pt
checkpoints, so I am wondering to which point in training does the finish.pt
belong to?
Sorry if I miss something obvious here.
Best,
Stefan
Thanks for your great work. I have one question about your paper. Table 4 shows the results for all PEFT methods "without" pertaining. Right?
@HaokunLiu @dptam Thank you for your great work and congrats on the neurips acceptance!
I have got an issue when using ddp
as follows:
AttributeError: 'DistributedDataParallel' object has no attribute 'save_checkpoint'
It's raised by the following line:
t-few/src/models/EncoderDecoder.py
Line 305 in 4e581fa
Any suggestion would be appreciated!
Another related question is why the ddp ckpt also needs to be processed by zero_to_fp32.get_fp32_state_dict_from_zero_checkpoint(distributed_save_path)
? I thought it should be applied to deepspeed zero ckpts only. This is done in:
t-few/src/models/EncoderDecoder.py
Line 308 in 4e581fa
Hi there,
I am trying to recreate the decoder attention mask and I am a bit puzzled by how it is created here
t-few/src/models/EncoderDecoder.py
Line 53 in 114dece
This creates a dense matrix with 1s everywhere. Shouldn't this be a lower triangular matrix (which is what T5Model does internally by default)?
Thanks a lot for your help!
Hi, Will you be releasing weights of T0 (3B) pre-trained via (IA)^3?
Hi!
I was trying to run the example in README, but it says KeyError: 'HF_HOME'
This is the script I used: python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp
I can't find anywhere in the code that sets the value of this environment variable.
Mark experiment first_exp as claimed
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Traceback (most recent call last):
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/weiqiuyou/Documents/codebases/t-few/src/pl_train.py", line 86, in <module>
main(config)
File "/Users/weiqiuyou/Documents/codebases/t-few/src/pl_train.py", line 57, in main
trainer.fit(model, datamodule)
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1131, in _run
self._data_connector.prepare_data()
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 154, in prepare_data
self.trainer.datamodule.prepare_data()
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
fn(*args, **kwargs)
File "/Users/weiqiuyou/Documents/codebases/t-few/src/data/data_module.py", line 17, in prepare_data
_ = self.dataset_reader.read_few_shot_dataset()
File "/Users/weiqiuyou/Documents/codebases/t-few/src/data/dataset_readers.py", line 164, in read_few_shot_dataset
orig_data = self.read_orig_dataset("train")
File "/Users/weiqiuyou/Documents/codebases/t-few/src/data/dataset_readers.py", line 146, in read_orig_dataset
orig_data = load_dataset(*self.dataset_stash, split=split, cache_dir=os.environ["HF_HOME"])
File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/os.py", line 678, in __getitem__
raise KeyError(key) from None
KeyError: 'HF_HOME'
Hi,
thank you very much for sharing your code!
I ran the example from the readme and the parts of the few-shot-pretrained-3b-100k.sh
script. However, the dev_scores.json
for the readme example only contains the line:
{"accuracy": 0.6101083032490975, "score_gt": 0.3983679488032303, "score_cand": 0.6958685107394676}
And for t03b_copa_seed42_ia3_pretrained100k
(the first experiment of few-shot-pretrained-3b-100k.sh
):
{"accuracy": 0.85, "score_gt": 0.06061243396921782, "score_cand": 0.4640417302213609}
Those are just the results of the "Validation sanity check" right at the beginning, so I wondered where the validation results after each epoch are stored or am I missing something here?
Thanks!
From the paper: "As an objective, we use the sum of a standard language modeling loss, an unlikelihood loss for incorrect choices, and a length-normalized loss."
And the code also uses huggingface transformers. I was just wondering if you could point me to where the loss function is modified and then used in training in the codebase.
Thank you for your valuable contributions. I am currently attempting to replicate the outcomes presented in your research paper. However, I am encountering difficulties in obtaining the desired results when I attempt to re-run LoRA adapters.
copa: 76.00 (2.00), h-swag: 26.64 (0.36), storycloze: 84.87 (0.21), winogrande: 51.14 (2.13), wsc: 65.38 (2.88), wic: 51.57 (0.63), rte: 59.57 (0.36), cb: 51.79 (1.79), anli-r1: 34.80 (0.80), anli-r2: 34.00 (2.40), anli-r3: 32.92 (1.08)
Have you encountered situations where the training of "h-swag" and "rte" did not yield successful results?
When I tried to run the demo, I found this error! @dptam @jmohta @muqeeth
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
WARNING:datasets.builder:Reusing dataset super_glue (/Users/caffrey/Documents/research/t-few-master/cache/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Missing logger folder: exp_out/first_exp/log
WARNING:datasets.builder:Reusing dataset super_glue (/Users/caffrey/Documents/research/t-few-master/cache/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277
| Name | Type | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B
-----------------------------------------------------
2.8 B Trainable params
0 Non-trainable params
2.8 B Total params
11,399.029Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last):
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/caffrey/Documents/paper/t-few-master/src/pl_train.py", line 86, in <module>
main(config)
File "/Users/caffrey/Documents/paper/t-few-master/src/pl_train.py", line 57, in main
trainer.fit(model, datamodule)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
self._run_sanity_check()
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
val_loop.run()
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 199, in run
self.on_run_start(*args, **kwargs)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 88, in on_run_start
self._data_fetcher = iter(data_fetcher)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 178, in __iter__
self.dataloader_iter = iter(self.dataloader)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 443, in __iter__
return self._get_iterator()
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 389, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1062, in __init__
w.start()
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'create_collate_fn.<locals>.collate_fn'
Hello, I am wondering if I can use this code to classify sentences as urgent / not urgent.
Also if I could use even these datasets and model to accomplish this classification as urgent not urgent
Firstly, thank you for the amazing work! I had a question around the implementation of
The config file for (IA)3 lists lora_layers
as "k|v|wi_1.*"
Line 6 in 4e581fa
However, when using this string to find model layers to modify (code snippet below), it seems that while the Keys and Values in the self-attention modules are modified, all the FF layers (i.e. in the format encoder.block.x.layer.x.DenseReluDense.wi
) are skipped, and thus the vector
Lines 64 to 72 in 4e581fa
I was thus wondering if the param lora_layers
should instead be "k|v|wi.*"
? Or am I missing something, and the existing config file somehow also triggers the creation of
Thank you!
Hi :)
I was reading your interesting paper https://arxiv.org/pdf/2205.05638.pdf.
In Section 3.3, you specify that IA^3 adds a total of d_k + d_v + d_ff parameters.
However, if I look at this line, you seem to be allocating 2 * d vectors for each linear layer (multi_lora_a, multi_lora_b) and multiplying multi_lora_a with the input and multi_lora_b with the transformed input.
Line 43 in 9dbc9cc
Am I missing something?
Thank you for your clarification :-)
Hello,
Have you tried training on Multi-GPU setup? I tried running your fine-tuning example like so:
export CUDA_VISIBLE_DEVICES=0,1
python -m src.pl_train -c t03b.json+ia3.json+rte.json -k load_weight="pretrained_checkpoints/t03b_ia3_finish.pt" exp_name=t03b_rte_seed42_ia3_pretrained100k few_shot_random_seed=42 seed=42
But I get errors in the lightning data loaders.
Any Ideas?
Thank you
Hi, thanks for open-sourcing model code! Could you release the log probabilities for evaluation tasks (i.e., the model probabilities for valid answers for each prompt on each question for all evaluated datasets)? This data would allow for for fine-grained evaluation of models and comparing against other LLMs.
Hi, may I ask that what does the multi_lora_a
mean ? Is there any paper that has explained it ? Many thanks!
Line 22 in 43fdb51
I tried running the example from the README and got this error. Can you help?
$ CUDA_VISIBLE_DEVICES=3 python -m src.pl_train -c t0.json+rte.json -k save_model=False exp_name=first_exp
Traceback (most recent call last):
File "/home/james/.conda/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/james/.conda/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/james/github/t-few/src/pl_train.py", line 10, in <module>
from src.models.EncoderDecoder import EncoderDecoder
File "/home/james/github/t-few/src/models/EncoderDecoder.py", line 11, in <module>
from .intrinsic import intrinsic_plugin_on_step
File "/home/james/github/t-few/src/models/intrinsic.py", line 10, in <module>
from .fwh_cuda import fast_walsh_hadamard_transform as fast_walsh_hadamard_transform_cuda
ImportError: cannot import name 'fast_walsh_hadamard_transform' from 'src.models.fwh_cuda' (unknown location)
Congrats on your great work! I am interested in analyzing the results of T0-3B + IA3's predictions on NLI tasks. I run the command python -m src.pl_train -c t03b.json+anli-r3.json+ia3.json -k exp_name=anli-r3 load_weight="pretrained_checkpoints/t03b_ia3_finish.pt" eval_epoch_interval=20
but only see the dev_scores.json
file in the output. How can I also obtain the prediction file of the model? Thanks!
Hello,
I am interested in using T-Few recipe for some experiments with Google Cloud TPUs. I am wondering whether the pl_train.py script supports TPU already? I read in the Acknowledgments section of the paper the authors cite that TPU cloud was utilised, however in this script I can see that gpu is directly supported. Any pointers will be appreciated, particularly I would like to use T0β11b
Thank you for the amazing work on t-few! I've noticed strange behavior when I am running superglue's wsc. I've been logging the validation score every 40 epochs using self.eval_epoch_interval = 40
and when running the command:
python -m src.pl_train -c ia3.json+wsc.json -k save_model=False exp_name=first_exp
the output is as following:
{"accuracy": 0.6730769230769231, "score_gt": 0.5068197436630726, "score_cand": 0.7191649047801127}
{"accuracy": 0.49038461538461536, "score_gt": 1.4563168384707892, "score_cand": 1.505529030584372}
{"accuracy": 0.47115384615384615, "score_gt": 3.4743554890155792, "score_cand": 2.727144861450562}
{"accuracy": 0.46153846153846156, "score_gt": 4.202766236777489, "score_cand": 3.5702959763316007}
{"accuracy": 0.40384615384615385, "score_gt": 5.157541000499175, "score_cand": 3.5657502871293287}
{"accuracy": 0.3942307692307692, "score_gt": 5.397989429533482, "score_cand": 3.975659689651086}
{"accuracy": 0.40384615384615385, "score_gt": 5.073869264469697, "score_cand": 3.995581218542961}
The last accuracy score is reported at 240 epochs out of a total 250 epochs.
Any ideas on what is going on here? Thanks!
Is there support for the t-few method for the case of decoder only models?
Hi!
Congratulations on this great work and thank you for putting up such an easy to use framework! It definitely facilitates research quite a bit :)
I was trying to interpret the scores logged during the evaluation of the development set and I realized that sometimes when summing the scores of the exponentiated negative of the scores for GT
and CAND
results in a sum bigger than 1 for two class datasets (like RTE). Maybe I'm interpreting these scores wrongly since I was expecting the sum of the scores (after converting them to probability space (that is np.exp(-1 * logprob)
) to be less than or equal to 1 for two class datasets.
Would you let me know if my rationale is flawed and if or why the sum of the probabilities may be above 1?
Thank you in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.