A repo to explore different NLP tasks which can be solved using T5
patil-suraj / exploring-t5 Goto Github PK
View Code? Open in Web Editor NEWA repo to explore different NLP tasks which can be solved using T5
A repo to explore different NLP tasks which can be solved using T5
Hi,
I am having an issue about setting hyperparameters as below. I understand due to version change, that's not the way to give the parameters, however I couldn't figure out what to do. Any suggestions would be great.
class T5FineTuner(pl.LightningModule):
def init(self, hparams):
super(T5FineTuner, self).init()
self.hparams = hparams
Cell In[163], line 5, in T5FineTuner.init(self, hparams)
2 def init(self, hparams):
3 super(T5FineTuner, self).init()
----> 5 self.hparams = hparams
7 self.adam_epsilon=1e-08
8 self.data_dir='aclImdb'
File ~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1313, in Module.setattr(self, name, value)
1311 buffers[name] = value
1312 else:
-> 1313 super().setattr(name, value)
AttributeError: can't set attribute
I was able to use the t5-base model without any problems, but when I tried to run the same code just changing the model to t5-large, the following message appears after running trainer = pl.Trainer(**train_params)
MisconfigurationException Traceback (most recent call last)
in
----> 1 trainer = pl.Trainer(**train_params)
~/anaconda3/envs/Transformers/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in init(self, logger, checkpoint_callback, early_stop_callback, callbacks, default_root_dir, gradient_clip_val, process_position, num_nodes, num_processes, gpus, auto_select_gpus, num_tpu_cores, log_gpu_memory, progress_bar_refresh_rate, overfit_pct, track_grad_norm, check_val_every_n_epoch, fast_dev_run, accumulate_grad_batches, max_epochs, min_epochs, max_steps, min_steps, train_percent_check, val_percent_check, test_percent_check, val_check_interval, log_save_interval, row_log_interval, add_row_log_interval, distributed_backend, precision, print_nan_grads, weights_summary, weights_save_path, num_sanity_val_steps, truncated_bptt_steps, resume_from_checkpoint, profiler, benchmark, reload_dataloaders_every_epoch, auto_lr_find, replace_sampler_ddp, progress_bar_callback, amp_level, default_save_path, gradient_clip, nb_gpu_nodes, max_nb_epochs, min_nb_epochs, use_amp, show_progress_bar, nb_sanity_val_steps, terminate_on_nan, **kwargs)
436 self.gpus = gpus
437
--> 438 self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
439 self.root_gpu = determine_root_gpu_device(self.data_parallel_device_ids)
440 self.root_device = torch.device("cpu")
~/anaconda3/envs/Transformers/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py in parse_gpu_ids(gpus)
710 gpus = normalize_parse_gpu_string_input(gpus)
711 gpus = normalize_parse_gpu_input_to_list(gpus)
--> 712 gpus = sanitize_gpu_ids(gpus)
713
714 if not gpus:
~/anaconda3/envs/Transformers/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py in sanitize_gpu_ids(gpus)
676 You requested GPUs: {gpus}
677 But your machine only has: {all_available_gpus}
--> 678 """)
679 return gpus
680
MisconfigurationException:
You requested GPUs: [0]
But your machine only has: []
If I run this code:
args = argparse.Namespace(**args_dict)
print(args_dict)
The result is:
{'output_dir': '/home/mydir', 'model_name_or_path': 't5-large', 'tokenizer_name_or_path': 't5-large', 'max_seq_length': 90, 'learning_rate': 0.0003, 'weight_decay': 0.0, 'adam_epsilon': 1e-08, 'warmup_steps': 0, 'train_batch_size': 8, 'eval_batch_size': 8, 'num_train_epochs': 2, 'gradient_accumulation_steps': 8, 'n_gpu': 1, 'early_stop_callback': False, 'fp_16': False, 'opt_level': 'O1', 'max_grad_norm': 1.0, 'seed': 42}
I don't understand why my GPU isn't being found now. I have a RTX 2080 Ti
I fine-tuned t5-large for paraphrase generation for 2 epoches and the paraphrases generated looks good. When I trained for 11 epochs and the model seems overfitted (the paraphrases generated is similar to the original sentence).
1.I want to check the performance of checkpoints saved, but I don't know how to do it.
I tried
PATH ='./t5_paraphrase/checkpointepoch=10.ckpt'
model =T5ForConditionalGeneration.from_pretrained(PATH)
gives error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I also tried (https://pytorch-lightning.readthedocs.io/en/latest/weights_loading.html
)
model =T5ForConditionalGeneration.load_from_checkpoint(PATH)
AttributeError: type object 'T5ForConditionalGeneration' has no attribute 'load_from_checkpoint'.
Do you have any recommendation for fine tuning T5? I know this question is too broad. I have explored all the data set I have. For hyperparameters, I read the doc for pytorch lightning and found auto_lr_find, auto_scale_batch_size, fast_dev_run may be fun to try. However, because of the definition of t_total in train_dataloader, these will report error. So maybe there is no more tricks on this side.
For paraphrase generation using T5 as a text-to-text task, I don't know how to utilize the negative examples directly here. Any recommendation? I plan to further fine tune T5-large's paraphrase identification with my data set (with positive and negative examples) and then used this fine tuned version to further fine tune on paraphrase generation. I am still investigating how to do this, so any help will be appreciated.
In your example, you use T5ForConditionalGeneration. https://huggingface.co/transformers/model_doc/t5.html#t5forconditionalgeneration It is not very clear to me when I need to use T5Model rather than T5ForConditionalGeneration. Any resources on this?
Thanks!!
I have applied your code on the emotion recognition task. I have got a really good result. Thank you for this source. I have 8 emotions and for test data, I would like to print each emotion with its probability, like ['happy': 0.061, 'surprise': 0.148, 'fear': 0.031]. In the prediction, only printed the only 1 emotion. I have increased the parameter of "num_return_sequences" to 8, then I have got some outputs which are not from emotions list. I know my question is a bit stranger :)
I tried to track the cause and found that the "training_step" never be called.
I think it may relate with the "ImdbDataSet" for the train_dataloadder, but I debuged it and it seems all right.
I just begin to contact the DeepLearning, so maybe there is something is obvious but I really don't know.
Do you have any idea about what may cause it?
Thank you and looking forward your any feedback.
Best Regards
Hello Suraj ,
Thanks for sharing this fine tuning of t5 notebook.
I am trying to use it as it is but when i am running it on colab it throwing following error.
**trainer = pl.Trainer(train_params)
"TypeError: init() got an unexpected keyword argument 'early_stop_callback'"
I will appreciate your time in resolving this issue.
thanks
Hi, Suraj! Thanks for making this notebook.
I am new to this. I couldn't find the clear definition of 'lm_labels' on huggingface. I noticed in your notebook, you put the target ids into 'lm_labels' rather than decoder_input_ids. Could you let me know why? I am trying to use your code to fine tune on paraphrase generation. Thanks!
Hi ! I used your notebook as a starting point for fine-tuning a T5-based model (ByT5) with the latest versions of PyTorch Lightning, Transformers, etc. I also use the Datasets library instead of downloading from Stanford, so it's a little more adaptable. Feel free to update or let me know if this can be added as a new example notebook.
https://colab.research.google.com/drive/1syXmhEQ5s7C59zU8RtHVru0wAvMXTSQ8
ValueError Traceback (most recent call last)
/tmp/ipykernel_3356/3905977959.py in
----> 1 fill = pipeline('fill-mask', model='tamil_bert', tokenizer='tamil_bert')
~/.local/lib/python3.8/site-packages/transformers/pipelines/init.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs)
452 # Will load the correct model if possible
453 model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
--> 454 framework, model = infer_framework_load_model(
455 model,
456 model_classes=model_classes,
~/.local/lib/python3.8/site-packages/transformers/pipelines/base.py in infer_framework_load_model(model, config, model_classes, task, framework, **model_kwargs)
156
157 if isinstance(model, str):
--> 158 raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}.")
159
160 framework = "tf" if model.class.name.startswith("TF") else "pt"
ValueError: Could not load model tamil_bert with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForMaskedLM'>,).
Hi when I try to run the codes I got this error:
AttributeError: 'Trainer' object has no attribute 'proc_rank'
in is_logger:
return self.trainer.proc_rank <= 0
thanks for your help.
Hello Suraj, I have been really enjoying reading your code! I am wondering, for a beginner like me, is there any kind of tutorial I can learn to fine-tune almost any published model to do the certain task? e.g. fine-tuning T5 with SQuAD to answer open-book questions, which is using a context and a question to predict an answer. I think t5 has that task implemented but I am having a very difficult time doing so with non-Transformer models.
Thank you!
Hi, I am using your code to fine tune my language model, so for I saved fine tuned model. After that while trying to evaluate error occured.
outs = model.model.generate(input_ids=batch['source_ids'].cuda(),
attention_mask=batch['source_mask'].cuda(),
max_length=2) # ------------ Error occures here
dec = [tokenizer.decode(ids) for ids in outs]
texts = [tokenizer.decode(ids) for ids in batch['source_ids']]
targets = [tokenizer.decode(ids) for ids in batch['target_ids']]
#-----ERROR
6 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1914 # remove once script supports set_grad_enabled
1915 _no_grad_embeddingrenorm(weight, input, max_norm, norm_type)
-> 1916 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1917
1918
RuntimeError: Input, output and indices must be on the current device
Hi
I am running finetuning for IMDB I am getting this error, thanks for your help
File "main.py", line 70, in train
trainer.fit(model)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 440, in fit
results = self.accelerator_backend.train()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 54, in train
results = self.train_or_test()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 68, in train_or_test
results = self.trainer.train()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 485, in train
self.train_loop.run_training_epoch()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 544, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 713, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 453, in optimizer_step
optimizer, batch_idx, opt_idx, train_step_and_backward_closure
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 122, in optimizer_step
using_lbfgs=is_lbfgs
TypeError: optimizer_step() got an unexpected keyword argument 'using_native_amp'
Exception ignored in: <function tqdm.__del__ at 0x7fc3337da9e0>
Traceback (most recent call last):
File "/opt/conda/envs/test/lib/python3.7/site-packages/tqdm/std.py", line 1128, in __del__
File "/opt/conda/envs/test/lib/python3.7/site-packages/tqdm/std.py", line 1341, in close
File "/opt/conda/envs/test/lib/python3.7/site-packages/tqdm/std.py", line 1520, in display
File "/opt/conda/envs/test/lib/python3.7/site-packages/tqdm/std.py", line 1131, in __repr__
File "/opt/conda/envs/test/lib/python3.7/site-packages/tqdm/std.py", line 1481, in format_dict
TypeError: cannot unpack non-iterable NoneType object
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.