Comments (7)
Hi there,
Thanks for your interest in our work, and your description is clear, so I can fully understand your current situation.
First of all, thanks for pointing out the problem with the validation output of the PretrainDataset, as I revised the code for training and testing after I pretrained the model, so there might be some incompatibilities with the pre-training process. And what you understand was totally correct, as I did not compute ROUGE score in pre-training phase, so although tgt is required as part of the input, it is not in use at all. I will revise the code to support it.
And as for the problem when running the pre-training method, as I have not pre-trained the model from HF version of LED before, so I have not faced exactly the same problem. Based on the errors, it seems like it's an out-of-index problem. Could you double check the length of input and output of the batch causing this error, as well as the size of embedding layer of the LED-large model? It might be relevant to the settings of max_input_len
and max_output_len
in this function.
Let me know if you still cannot find the problem.
from primer.
I managed to run the code of pretrain process. I just change code of validation_step function. My code is down below. I dont know whether my code is good or bad. @Wendy-Xiao , @jaineshdoshi , Can you comment about my change? Thank you so much!
@jaineshdoshi Furthermore, in the paper, the config is said to be changed, im quite confused about how to change it. Do you change anything with your config file of led-large model or you just initialize the model with hf config file? Is there any problem if i just use led-large and dont change anything? Thank you so much, look forwarding to receiving reply from you.
from primer.
Hi,
@Wendy-Xiao
Thanks for your prompt reply and appreciate your help here!
I checked the batch max input/output lengths and they were set at 4096/1024 respectively, while the input samples too were truncated correctly with the code. I tried using different tokenizers (PRIMERA and led-large-16384) but still resulted in the same error. would yo have any other checks that I might be missing out on to handle this issue?
I double checked the token embeddings and it doesn't seem to be an issue from that, I think it might be from positional embeddings indexing issues. Would you know how to sanity check that?
Edit: I ran the code on a CPU to get the exact error due to the embeddings. Please see below for my update:
Maybe, I can try to use the longformer/LED implementation from the Github instead of the HuggingFace version but I recall I had faced some other errors initially with that too.
@haisonle001
I had also tried that approach to make the code here work but without luck as I ended up with the same error.
If you wont mind, could you point me to what you did differently from the steps that I have mentioned above in order to get the code working for pretraining?
I used the same configs of LED large as acquired from the hugging face model directly. I believe the authors can help you out better on this one.
Cheers!
Edit:
By running the pretrain code on the CPU I managed to pin down the issue with the embedding token.
This is the tokenizer that we want to have, with a vocab size of 50265 (incluing the token which is not present in the LED tokenizer)
model.tokenizer
PreTrainedTokenizerFast(name_or_path='allenai/PRIMERA', vocab_size=50265, model_max_len=4096, is_fast=True, padding_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False), 'additional_special_tokens': ['<doc-sep>']})
In, the pretraining mode, the tokenizer got updated with the required token, but I observed that the embedding table torch.nn.functional.embedding was not receiving the correct dimensions for the tokens as the shape of the weights tensor at the input was weight.shape = torch.Size([50265, 1024])
which is defined as:
weight (Tensor): The embedding matrix with number of rows equal to the maximum possible index + 1,
and number of columns equal to the embedding size
So, in this case it should be (50266,1024) after addition of the <doc-sep>
token.
(I still am figuring it out as to how to solve this for now, but all help is appreciated!)
Thanks!
from primer.
@jaineshdoshi I use primer_main.py instead of primer_hf_main.py and i just changed code as above. I dont see much different between 2 files expect for how they loaded the model? You can try primer_main.py and do like what i did? may be it works.
from primer.
Oh yeah, that makes sense! @jaineshdoshi. In the original code, I resize the embedding when initializing the model, and then I might remove those lines when I cleaned up the code.
you can try adding these lines into the init function:
# The special token is added after each document in the pre-processing step.
self.tokenizer.add_special_tokens(
{"additional_special_tokens": ["<doc-sep>"]}
)
self.docsep_token_id = self.tokenizer.additional_special_tokens_ids[0]
self.model.resize_token_embeddings(len(self.tokenizer))
from primer.
Hi,
@Wendy-Xiao
Thank you for the code updates!
I just got it working yesterday by adding the lines that you mentioned above as soon as realized that the embedding dimensions weren't updated as per the new token.
I do have a couple of questions on the model configuration.
Looking at the PRIMERA config.json files I do observe a few different parameters used for the LED, for example the attention window parameter in PRIMERA models config file is:
"attention_window": [
512,
512,
512,
512,
512,
512,
512,
512,
512,
512,
512,
512
],
while the same parameter in the LED large model config file (that the paper says is the starting point for pretraining PRIMERA) on HF is
"attention_window": [
1024,
1024,
1024,
1024,
1024,
1024,
1024,
1024,
1024,
1024,
1024,
1024
],
I am a bit confused as to if this was the LED that was used as a starting point for pretraining of PRIMERA.
Can you shed some light on how do I get the correct configuration for the PRIMERA model to clear my confusion here?
@haisonle001
Thanks for helping out!
I am hoping that the pretraining using the HF based code/models should still produce the same experiment as such. Will update you folks if I find any disparities!
from primer.
Yes, as indicated in the paper(sec 4.1), we used different window size (512) and length limit for the input (4096) and output (1024) for PRIMERA. All the settings can be found in the paper (sec 4.1 and appendix A), and if there is no specification, we just used the same setting as LED. You can refer PRIMERA config.
from primer.
Related Issues (20)
- bash script of fine-tuning on multinews dataset on multiple gpus using ddp HOT 4
- About the WCEP dataset HOT 1
- Can PRIMERA accept 16k input? HOT 4
- Trying to get results from the paper HOT 2
- Question about preprocessing MuitiNews dataset
- Questions about inferencing HOT 4
- Training PRIMER from Scratch
- Pretraining-Mask sentences
- Trying to get results on multi_news dataset from the paper
- Questions of using multiple GPUs for training HOT 1
- Question about inferencing multi-news datasets HOT 2
- RuntimeError: CUDA error: device-side assert triggered HOT 1
- Training ended but utilization of GPU remains 100%
- What is led_summ?
- Would appreciate some guidance on using the model for MDS
- Evaluation example for multi_x_science_sum dataset
- Which cluster algorithm was used in pre-training?
- I keep getting "index out of range in self" during forward pass
- Which Pre-trained model results are reported for Multi-Xscience?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from primer.