allenai / primer Goto Github PK

The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

License: Apache License 2.0

Python 62.49% Shell 6.02% Jupyter Notebook 31.49%

primer's Introduction

PRIMERA

The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization.

PRIMERA is a pre-trained model for multi-document representation with focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data. With extensive experiments on 6 multi-document summarization datasets from 3 different domains on the zero-shot, few-shot and full-supervised settings, PRIMER outperforms current state-of-the-art models on most of these settings with large margins.

Updates (2022-MAR-09)

For better usage of the model, we convert our trained models to the Huggingface version, which will be loaded to the Huggingface Model Hub soon. (The code for model conversion can be found Convert_to_hf_LED.ipynb, where the input is the state_dict() of our model)

We update the scripts and (example) bash files to run the Huggingface version of PRIMERA in the ./script/primer_hf_main.py and ./run_bash/, respectively. We also create a notebook as an example usage for evaluating our fine-tuned model on the multi-news dataset (Evaluation_Example.ipynb).

Note: due to the difference between the implementations of the original Longformer and the Huggingface LED model, the results of converted models are slightly different. We run a sanity check on both fine-tuned and non fine-tuned models on the Multi-News dataset, and show the results below:

Model	Rouge-1	Rouge-2	Rouge-L
PRIMERA	42.0	13.6	20.8
PRIMERA-hf	41.7	13.6	20.5
PRIMERA(finetuned)	49.9	21.1	25.9
PRIMERA-hf(finetuned)	49.9	20.9	25.8

Set up

Create new virtual environment by

conda create --name primer python=3.7
conda activate primer
conda install cudatoolkit=10.0

Install Longformer by

pip install git+https://github.com/allenai/longformer.git

Install requirements to run the summarization scripts and data generation scripts by

pip install -r requirements.txt

Usage of PRIMERA

Download the pre-trained PRIMERA model here to ./PRIMERA_model
Load the tokenizer and model by

from transformers import AutoTokenizer
from longformer import LongformerEncoderDecoderForConditionalGeneration
from longformer import LongformerEncoderDecoderConfig

tokenizer = AutoTokenizer.from_pretrained('./PRIMERA_model/')
config = LongformerEncoderDecoderConfig.from_pretrained('./PRIMERA_model/')
model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(
            './PRIMERA_model/', config=config)

Make sure the documents separated with <doc-sep> in the input.

Summarization Scripts

You can use script/primer_main.py for pre-train/train/test PRIMERA, and script/compared_model_main.py for train/test BART/PEGASUS/LED.

Sample usages of both scripts can be found in run_bash/.

Datasets

For Multi-News and Multi-XScience, it will automatically download from Huggingface.
WCEP-10: the preprocessed version can be found here
Wikisum: we only use a small subset for few-shot training(10/100) and testing(3200). The subset we used can be found here. Note we have significantly more examples than we used in train.pt and valid.pt, as we sample 10/100 examples multiple times in the few-shot setting, and we need to make sure it has a large pool to sample from.
DUC2003/2004: You need to apply for access based on the instruction
arXiv: you can find the data we used in this repo

Fully Supervised Models

We provide all the fully supervised models below.

Pre-training Data Generation

Newshead: we crawled the newshead dataset using the original code, and cleaned up the crawled data, the final newshead dataset can be found here.

You can use utils/pretrain_preprocess.py to generate pre-training data.

Generate data with scores and entities with --mode compute_all_scores (The processed data with scores and entities can be found here)
Generate pre-training data with --mode pretraining_data_with_score:
- Pegasus: --strategy greedy --metric pegasus_score
- Entity_Pyramid: --strategy greedy_entity_pyramid --metric pyramid_rouge (The processed data that could directly be used for pre-training can be found here)

primer's People

Contributors

Stargazers

Watchers

primer's Issues

Using the (pretrained) model on new data

Hi,

First of all many thanks to the whole team for the amazing work. I'm trying to use the pretrained model (on MultiNews) to make inference on new data. At the moment I'm just trying with the test set of Multinews itself.

I instantiate the model as suggested:

tokenizer = AutoTokenizer.from_pretrained(model_path)
config = LongformerEncoderDecoderConfig.from_pretrained(model_path)
model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(model_path, config=config).to(device)

Then I prepare the input similarly to any other HF model (I set max_input_length=4096):

inputs_dict = tokenizer(input_docs, padding="max_length", max_length=max_input_length, return_tensors="pt", truncation=True)
input_ids = inputs_dict.input_ids.to(device)
attention_mask = inputs_dict.attention_mask.to(device)

At the end I use the following to generate the summary:

predicted_ids = model.generate(input_ids, attention_mask=attention_mask)
text = tokenizer.batch_decode(predicted_ids, skip_special_tokens=True)

However, the summaries are very short if compared with what was expected (at least for MNews). Hereafter an example of the output:

– Voters in 11 states will pick their governors tonight, and Republicans appear on track to increase their

It even seems to be truncated, is there something I'm doing wrong?

Trying to get results from the paper

Hi, thanks a lot for sharing your work.

I want to try to get locally your zero-shot results for the DUC2004 dataset, but so far I get the wrong results (about 22, 3 and 15 points for R-1, R-2 and R-L respectively)

Tell me, please, is there a pretrained Longformer in your PRIMERA model published on HuggingFace or not?

If so, how can I repeat the above experiment locally with HF version of model?

The state dictionary of the model you are training to load is corrupted.

Hello I'm interested to test out the PRIMER model but when I followed the steps accordingly, I ended up getting

"ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?"

I've tried reinstalling the PRIMER-large packages too but to no avail. Did anyone encounter the same issue?

OSError: Unable to load weights from pytorch checkpoint file

Firstly, thanks for the excellent work. I'm trying the PRIMER model, and facing the following error when loading model:

from transformers import AutoTokenizer
from longformer import LongformerEncoderDecoderForConditionalGeneration
from longformer import LongformerEncoderDecoderConfig

tokenizer = AutoTokenizer.from_pretrained('/content/PRIMER/PRIMER/')
config = LongformerEncoderDecoderConfig.from_pretrained('/content/PRIMER/PRIMER/')
model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(
            '/content/PRIMER/PRIMER/', config=config)

/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    857             except Exception:
    858                 raise OSError(
--> 859                     "Unable to load weights from pytorch checkpoint file. "
    860                     "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
    861                 )

OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I was running on Google Colaboratory, and skipped the Conda install part for simplicity. Is this why I got the above error ?

Results for PRIMERA-arxiv

Hi,

Thanks for sharing this nice work. After running your codes, I can just get 28.5 for RougeL-fmeasure for arxiv dataset, but in your paper the Rouge-L is 42.6 in Table 3, while Rouge-1 and Rouge-2 are the same as yours. Moreover, I can only get 46.6/19.1/27.5 for Rouge-1/2/L with led-large-16384-arxiv (i.e., the SOTA for arxiv), but in your Table 3, it is 41.8 for Rouge-L. Could you please helping to explain how you get such high Rouge-L values for arxiv dataset?

Mismatch between pre-training and fine-tuning phase

As far as I'm aware, PRIMERA replaces sentences with <SENT-MASK> tokens during pre-training. However, these <SENT-MASK> tokens do not appear on fine-tuning phase, or when doing inference. Still, it was shown to achieve impressive results on Zero-Shot and Few-Shot Evaluation. I was wondering if PRIMERA had any strategies to reduce the mismatch between fine-tuning and pre-training phase ? (I did not see related information mentioned in the paper)

Can PRIMERA accept 16k input?

Could you please tell me can the models on HF (https://huggingface.co/allenai/PRIMERA, https://huggingface.co/allenai/PRIMERA-arxiv) accept 16k input. Can I just set the max_length to 16384 to let it accept such a length of a long document? Thanks.

RuntimeError: CUDA error: device-side assert triggered

When running the code on multi_news dataset, it raises error:

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [217,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [217,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [217,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [217,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [217,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "primer_main.py", line 829, in <module>
    test(args)
  File "primer_main.py", line 621, in test
    trainer.test(model, test_dataloader)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in test
    results = self.__test_given_model(model, test_dataloaders)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in __test_given_model
    results = self.fit(model)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 513, in fit
    self.dispatch()
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 547, in dispatch
    self.accelerator.start_testing(self)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 77, in start_testing
    self.training_type_plugin.start_testing(trainer)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 115, in start_testing
    self._results = trainer.run_test()
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 793, in run_test
    eval_loop_results, _ = self.run_evaluation()
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 732, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 160, in evaluation_step
    output = self.trainer.accelerator.test_step(args)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 196, in test_step
    return self.training_type_plugin.test_step(*args)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 293, in test_step
    return self.model(*args, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 59, in forward
    output = self.module.test_step(*inputs, **kwargs)
  File "primer_main.py", line 353, in test_step
    return self.validation_step(batch, batch_idx)
  File "primer_main.py", line 274, in validation_step
    loss = self.shared_step(input_ids, output_ids)
  File "primer_main.py", line 150, in shared_step
    lm_logits = self.forward(input_ids, output_ids)
  File "primer_main.py", line 119, in forward
    use_cache=False,
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 1113, in forward
    return_dict=return_dict,
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 956, in forward
    return_dict=return_dict,
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 335, in forward
    embed_pos = self.embed_positions(input_ids)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 859, in forward
    return super().forward(positions + self.offset)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/wangyiting/anaconda3/envs/primer/lib/python3.7/site-packages/torch/nn/functional.py", line 1916, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered

I found that there is only one dataset named "multi_news" and did anyone run the code primer_main.py on multi_news without the above problems? It's quite strange that the error only occur on the multi_news dataset in my experiment /(ㄒoㄒ)/~~
Thank you very much!

How to convert "bart-large-cnn" to LED version

Hi,

Thanks for sharing this great work.
I find that you have published the "allenai/led-large-16384" in the Hugging Face "models". I am curious how do you convert the "bart-large" model to "led-large-16384" model. And can you provide the convert scripts? I want to try to convert "bart-large-cnn" to "led-large-cnn-16384"!

Thanks!

Pretraining-Mask sentences

Hello,
In the primera pretrain process. The model choose 30% of the sentences by pyramid methods and then 50% of the candidates (15% of the sentences) will be mask while all 30% will be kept as the target. May I know why the 15% masked sentences will not be inputted in the target?

for i_d in range(len(truncated_doc)):
for i_s in range(len(truncated_doc[i_d])):
if cur_idx in mask_indices:
tgt.append(truncated_doc[i_d][i_s])
# here is the line which choose 50% percent of the candidates (30% percent of sentences) for masking
if cur_idx not in non_mask_indices:
truncated_doc[i_d][i_s] = ''#tokenizer.mask_token
cur_idx += 1

Version for torchmetrics

Hi,

I am getting the below error while using the code to train the model.

Traceback (most recent call last):
  File "script/primer_main.py", line 1, in <module>
    from pytorch_lightning.accelerators import accelerator
  File "/home2/dhaval.taunk/miniconda3/envs/joint_gt/lib/python3.7/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/home2/dhaval.taunk/miniconda3/envs/joint_gt/lib/python3.7/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/home2/dhaval.taunk/miniconda3/envs/joint_gt/lib/python3.7/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/home2/dhaval.taunk/miniconda3/envs/joint_gt/lib/python3.7/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
    from pytorch_lightning.metrics.utils import deprecated_metrics
  File "/home2/dhaval.taunk/miniconda3/envs/joint_gt/lib/python3.7/site-packages/pytorch_lightning/metrics/utils.py", line 22, in <module>
    from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/home2/dhaval.taunk/miniconda3/envs/joint_gt/lib/python3.7/site-packages/torchmetrics/utilities/data.py)

Which version of torchmetrics is supposed to be installed? No information is being provided related to the torchmetrics version in requirements.txt

Training PRIMER from Scratch

Hello, thanks for your hard work.

I'm trying to train PRIMER from Scratch with a customized dataset using the zero-shot method. Is this feature available now? Any tutorial I can follow?

Much appreciated.

Config specifies max_position_embeddings as 1024

Hi!

I noticed that the PRIMERA configs specifies max_position_embeddings: 1024. Is this intentional? AFAICT the HuggingFace library treats this as the maximum position embedding size of the encoder, or max_encoder_position_embeddings, which for PRIMERA is 4096.

E.g. in their run_summarization.py script, they appear to treat max_position_embeddings as max_encoder_position_embeddings as they compare it to the max_source_length.

So I am wondering if max_position_embeddings should be set to 4096 in the PRIMERA configs, else it causes problems when trying to use with existing HF example scripts.

Question about preprocessing MuitiNews dataset

In the achieved Dataset class, each document is split by '|||||'. But notice that the last part is ignored. I really want to know the reason.

Issue in using given code for pretraining the model

Hi,
I am trying to reproduce the pre-training experiment with the codebase here and ran into the following issue.

Setup / Steps undertaken:

Use (LED-large) as the base model to begin with.
I used the preprocessed data given in the README file here.
I modified the primer_hf_main.py a bit to add the pretraining function from primer_main.py
I used the Pretrain Dataset class and dataloader functions defined here in the dataloader.py file to load the preprocessed dataset, essentially replicating the exact pretrain function from the non HuggingFace primer file as is in the primer_hf_main.py file.
I provided other relevant args for the pretrain mode to my modified file and passed them to the pl.Trainer as shown in the other file with the same values

Observations:

The PretrainDataset yields 2 values per sample in the batch even in the validation phase here.
Unlike the SummarizationIterDataset that yields 3 values in the validation mode here
Upon debugging, I realised that the collate_fn when used with the pretraining mode with the PretrainDataset receives 2 variables per batch and always defaults to this line raising an error in the validation sanity check test of the torch trainer.

Firstly, I wanted to ask the authors @Wendy-Xiao if this is the expected behaviour and if I am taking some wrong assumptions here...?

My solution:
To handle this, I added the tokenizer and decoded output string as the tgt string variable to be yielded (replicating the same behaviour in the SummarizationIterDataset iterator output using in the train mode)

tgt = self.tokenizer.decode(data['tgt'], skip_special_tokens=True)
yield torch.tensor(data["src"]), torch.tensor(data["tgt"]), tgt

But this yields errors like these:

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [574,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [574,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/PRIMER_train/script/primer_hf_main.py", line 982, in <module>
    pretrain(args)
  File "/PRIMER_train/script/primer_hf_main.py", line 456, in pretrain
    trainer.fit(model, train_dataloader, valid_dataloader)
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 460, in fit
    self._run(model)
  File "/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
    self.dispatch()
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 799, in dispatch
    self.accelerator.start_training(self)
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in run_stage
    return self.run_train()
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 844, in run_train
    self.run_sanity_check(self.lightning_module)
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in run_sanity_check
    self.run_evaluation()
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 967, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/Miniconda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/PRIMER_train/script/primer_hf_main.py", line 291, in validation_step
    loss = self.shared_step(input_ids, output_ids)
  File "PRIMER_train/script/primer_hf_main.py", line 123, in shared_step
    lm_logits = self.forward(input_ids, output_ids)
  File "PRIMER_train/script/primer_hf_main.py", line 89, in forward
    outputs = self.model(
  File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/lib/python3.8/site-packages/transformers/models/led/modeling_led.py", line 2338, in forward
    outputs = self.led(
  File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/lib/python3.8/site-packages/transformers/models/led/modeling_led.py", line 2189, in forward
    encoder_outputs = self.encoder(
  File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Miniconda/lib/python3.8/site-packages/transformers/models/led/modeling_led.py", line 1733, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Miniconda/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/Miniconda/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered

which I think are out of index errors in cuda for the embeddings. I checked that the new token was included in the tokenizer correctly.
(PS ignore the filepaths as they have been truncated here, but the code is configured to work as I have been able to finetune the base model on newer datasets)
Can anyone point me how to address these issue?
Or how did you manage to pretrain the model from scratch using the author's code given here?

Thanks a ton! :)

Trying to get results on multi_news dataset from the paper

Hi, thank you for your sharing. I got some troubles when I used script/primer_main.py for test PRIMERA. When I follow the settings in run_bash/test_primer.sh，I found that the generated summaries are just truncated from the source text depending on the max_length_tgt and are not generated by beam search no matter what beam_size I set. But I have no idea how to fix this proble，could you please tell me how to solve the problem? Thank you.

Training ended but utilization of GPU remains 100%

When using mode 'train' , (i.e. few-shot or fully supervised), I found that the training process ends but the utilization of GPU remains 100%. Did anyone meet the same problem? Thank you very much

How to finetune with a new dataset?

Hi, I am trying to finetune PRIMERA from huggingface using trainer, with a new dataset. However, i keep getting rouge scores of 0. May I know which part of the code is wrong?

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
import nltk
import numpy as np
TOKENIZER = AutoTokenizer.from_pretrained("allenai/PRIMERA")
MODEL = AutoModelForSeq2SeqLM.from_pretrained("allenai/PRIMERA")
import torch
MODEL.gradient_checkpointing_enable()
PAD_TOKEN_ID = TOKENIZER.pad_token_id
DOCSEP_TOKEN_ID = TOKENIZER.convert_tokens_to_ids("<doc-sep>")

from huggingface_hub import notebook_login
notebook_login()

here i load my own reformatted version of the multi_news dataset from huggingface - format is a (src,tgt) pair, where src is the related documents, tgt is the summary. its almost the same as the original multi_news dataset, just that i added a few more words at the front along with |||||.

train = load_dataset('cammy/multi_news_formatted_small', split='train[:100]', use_auth_token=True, cache_dir="D:")
valid = load_dataset('cammy/multi_news_formatted_small', split='valid[:10]', use_auth_token=True, cache_dir="D:")
test = load_dataset('cammy/multi_news_formatted_small', split='test[:10]', use_auth_token=True, cache_dir="D:")

then i do the preprocessing of data

then lastly:
trainer.train()

but these are the results:

Questions of using multiple GPUs for training

Hi, when I set --gpus>=2 it raise error:
AttributeError: Can't pickle local object 'get_linear_schedule_with_warmup.<locals>.lr_lambda'
Could it be trained with mulitiple GPUs? Thank you

pre-training PRIMERA

Im trying to pretrain primera on processed NewsHead dataset. Can you help me with a little more detail to implement it?

What is led_summ?

Perhaps this is a very basic question, as this code has been on GitHub since 2021 and no one else has asked it.

I installed it according to the procedure described but, in my installation, I cannot import the led_summ that is on line 11 of the pretrain_preprocess.py file.

I think it must be a requirement not described in requirements.txt. I've looked on Pypi, conda, github, google...

bash script of fine-tuning on multinews dataset on multiple gpus using ddp

Hi,

I wonder if there is a script to fine-tune the pre-trained PRIMERA model on multiple GPUs using distributed data parallel (From the run_bash I can only find test scripts). I tried using the following command:

python primer_main.py --primer_path "../PRIMERA_model" --gpus 8 --batch_size 1 --accelerator ddp

but it prompts out errors of ddp as follows:

Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1091.32it/s]
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1022.00it/s]
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/8
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1063.02it/s]
initializing ddp: GLOBAL_RANK: 2, MEMBER: 3/8
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1064.45it/s]
initializing ddp: GLOBAL_RANK: 3, MEMBER: 4/8
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1028.02it/s]
initializing ddp: GLOBAL_RANK: 4, MEMBER: 5/8
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/8
Namespace(acc_batch=16, accelerator='ddp', accum_data_per_step=16, adafactor=False, applyTriblck=False, attention_dropout=0.1, attention_mode='sliding_chunks', attention_window=512, batch_size=1, beam_size=1, ckpt_path=None, compute_rouge=False, data_path='../dataset/multi_news', dataset_name='multi_news', debug_mode=False, eval_steps=2500, fewshot=False, fix_lr=False, fp32=False, gpus=8, grad_ckpt=False, join_method='concat_start_wdoc_global', label_smoothing=0.0, length_penalty=1.0, limit_test_batches=None, limit_valid_batches=None, lr=3e-05, mask_num=0, max_length_input=4096, max_length_tgt=1024, min_length_tgt=0, mode='train', model_path='./longformer_summ_multinews/', num_train_data=-1, num_workers=1, primer_path='../PRIMERA_model', progress_bar_refresh_rate=1, rand_seed=0, remove_masks=False, report_steps=50, resume_ckpt=None, saveRouge=False, saveTopK=3, test_batch_size=-1, test_imediate=False, tokenizer='facebook/bart-base', total_steps=50000, val_check_interval=1.0, warmup_steps=1000)
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1046.74it/s]
initializing ddp: GLOBAL_RANK: 5, MEMBER: 6/8
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1075.46it/s]
initializing ddp: GLOBAL_RANK: 6, MEMBER: 7/8
Using native 16bit precision.
Using custom data configuration default
Reusing dataset multi_news (../dataset/multi_news/multi_news/default/1.0.0/9df9096a1eef569784b4859cc8009c53f31c66b9ccb4f9033feee1f875003adf)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1096.74it/s]
initializing ddp: GLOBAL_RANK: 7, MEMBER: 8/8
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 6 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 5 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 4 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 7 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]

  | Name  | Type                                             | Params
---------------------------------------------------------------------------
0 | model | LongformerEncoderDecoderForConditionalGeneration | 447 M 
---------------------------------------------------------------------------
447 M     Trainable params
0         Non-trainable params
447 M     Total params
1,788.895 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:69: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Validation sanity check:   0%|                                                                                                                          | 0/2 [00:00<?, ?it/sValidation Result at Step 0
Rouge-1 r score: 0.633570, Rouge-1 p score: 0.386207, Rouge-1 f-score: 0.473641
Rouge-2 r score: 0.225127, Rouge-2 p score: 0.137367, Rouge-2 f-score: 0.168398
Rouge-L r score: 0.258078, Rouge-L p score: 0.161508, Rouge-L f-score: 0.196181
Rouge-Lsum r score: 0.258078, Rouge-Lsum p score: 0.161508,             Rouge-Lsum f-score: 0.196181
Validation Result at Step 0
Rouge-1 r score: 0.542197, Rouge-1 p score: 0.253998, Rouge-1 f-score: 0.345720
Rouge-2 r score: 0.089577, Rouge-2 p score: 0.041425, Rouge-2 f-score: 0.056617
Rouge-L r score: 0.224821, Rouge-L p score: 0.104866, Rouge-L f-score: 0.142931
Rouge-Lsum r score: 0.224821, Rouge-Lsum p score: 0.104866,             Rouge-Lsum f-score: 0.142931
Validation Result at Step 0
Rouge-1 r score: 0.488497, Rouge-1 p score: 0.294507, Rouge-1 f-score: 0.365395
Rouge-2 r score: 0.149167, Rouge-2 p score: 0.091010, Rouge-2 f-score: 0.112431
Rouge-L r score: 0.266496, Rouge-L p score: 0.155666, Rouge-L f-score: 0.195401
Rouge-Lsum r score: 0.266496, Rouge-Lsum p score: 0.155666,             Rouge-Lsum f-score: 0.195401
Validation Result at Step 0
Rouge-1 r score: 0.418725, Rouge-1 p score: 0.389816, Rouge-1 f-score: 0.384908
Rouge-2 r score: 0.100215, Rouge-2 p score: 0.105262, Rouge-2 f-score: 0.098602
Rouge-L r score: 0.176946, Rouge-L p score: 0.153756, Rouge-L f-score: 0.156682
Rouge-Lsum r score: 0.176946, Rouge-Lsum p score: 0.153756,             Rouge-Lsum f-score: 0.156682
Validation Result at Step 0
Rouge-1 r score: 0.424317, Rouge-1 p score: 0.271739, Rouge-1 f-score: 0.325188
Rouge-2 r score: 0.133041, Rouge-2 p score: 0.074561, Rouge-2 f-score: 0.094382
Rouge-L r score: 0.236625, Rouge-L p score: 0.151552, Rouge-L f-score: 0.181355
Rouge-Lsum r score: 0.236625, Rouge-Lsum p score: 0.151552,             Rouge-Lsum f-score: 0.181355
Validation Result at Step 0
Rouge-1 r score: 0.511936, Rouge-1 p score: 0.385712, Rouge-1 f-score: 0.438370
Rouge-2 r score: 0.161077, Rouge-2 p score: 0.119126, Rouge-2 f-score: 0.136489
Rouge-L r score: 0.232773, Rouge-L p score: 0.176263, Rouge-L f-score: 0.199890
Validation Result at Step 0
Rouge-Lsum r score: 0.232773, Rouge-Lsum p score: 0.176263,             Rouge-Lsum f-score: 0.199890
Rouge-1 r score: 0.306800, Rouge-1 p score: 0.350738, Rouge-1 f-score: 0.235585
Rouge-2 r score: 0.063588, Rouge-2 p score: 0.072438, Rouge-2 f-score: 0.048596
Rouge-L r score: 0.130816, Rouge-L p score: 0.224060, Rouge-L f-score: 0.112750
Rouge-Lsum r score: 0.130816, Rouge-Lsum p score: 0.224060,             Rouge-Lsum f-score: 0.112750
Validation Result at Step 0
Rouge-1 r score: 0.442865, Rouge-1 p score: 0.526116, Rouge-1 f-score: 0.440968
Rouge-2 r score: 0.133483, Rouge-2 p score: 0.160556, Rouge-2 f-score: 0.133482
Rouge-L r score: 0.234281, Rouge-L p score: 0.297648, Rouge-L f-score: 0.240863
Rouge-Lsum r score: 0.234281, Rouge-Lsum p score: 0.297648,             Rouge-Lsum f-score: 0.240863
/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py:69: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Epoch 0:   0%|                                                                                                                                       | 0/6325 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [217,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [69,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [70,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [71,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [72,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [73,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [74,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [75,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [76,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [77,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [78,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [79,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [80,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [81,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [82,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [165,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/ec2-user/research/primer/script/primer_main.py", line 788, in <module>
    train(args)
  File "/home/ec2-user/research/primer/script/primer_main.py", line 524, in train
    trainer.fit(model, train_dataloader, valid_dataloader)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
    self.train_loop.run_training_epoch()
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
    batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 715, in run_training_batch
    split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
    result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
    training_step_output = self.trainer.accelerator.training_step(args)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
    return self.training_type_plugin.training_step(*args)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 319, in training_step
    return self.model(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/ec2-user/research/primer/script/primer_main.py", line 162, in training_step
    loss = self.shared_step(input_ids, output_ids)
  File "/home/ec2-user/research/primer/script/primer_main.py", line 142, in shared_step
    lm_logits = self.forward(input_ids, output_ids)
  File "/home/ec2-user/research/primer/script/primer_main.py", line 111, in forward
    use_cache=False,
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 1113, in forward
    return_dict=return_dict,
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 956, in forward
    return_dict=return_dict,
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 367, in forward
    x, attn = encoder_layer(x, attention_mask, output_attentions=output_attentions)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/transformers/modeling_bart.py", line 254, in forward
    query=x, key=x, key_padding_mask=encoder_padding_mask, output_attentions=output_attentions
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/longformer/longformer_encoder_decoder.py", line 71, in forward
    output_attentions=output_attentions,
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/longformer/longformer.py", line 114, in forward
    if max_num_extra_indices_per_batch <= 0:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2f4335e612 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x22c1e (0x7f2f435cdc1e in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x22d (0x7f2f435d0c4d in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x33a968 (0x7f2f365b4968 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f2f43343295 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #5: <unknown function> + 0x2147ad (0x7f2f3648e7ad in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x54b518 (0x7f2f367c5518 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object*) + 0x2b9 (0x7f2f367c5819 in /home/ec2-user/miniconda3/envs/primer/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0xfc359 (0x55e3a505c359 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #9: <unknown function> + 0xfc547 (0x55e3a505c547 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #10: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #11: <unknown function> + 0xfc50a (0x55e3a505c50a in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #12: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #13: <unknown function> + 0xfc523 (0x55e3a505c523 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #14: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #15: <unknown function> + 0xfc547 (0x55e3a505c547 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #16: <unknown function> + 0x181016 (0x55e3a50e1016 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #17: <unknown function> + 0xfc516 (0x55e3a505c516 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #18: <unknown function> + 0x163815 (0x55e3a50c3815 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #19: _PyGC_CollectNoFail + 0x2a (0x55e3a516175a in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #20: PyImport_Cleanup + 0x328 (0x55e3a510ce08 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #21: Py_FinalizeEx + 0x64 (0x55e3a5181714 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #22: <unknown function> + 0x232e20 (0x55e3a5192e20 in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #23: _Py_UnixMain + 0x3c (0x55e3a519318c in /home/ec2-user/miniconda3/envs/primer/bin/python)
frame #24: __libc_start_main + 0xea (0x7f2f5466b13a in /lib64/libc.so.6)
frame #25: <unknown function> + 0x1d803a (0x55e3a513803a in /home/ec2-user/miniconda3/envs/primer/bin/python)

Are there any insights on this error? And also could you provide your bash scripts for fine-tuning the model on multi-news? Thanks much!

Question about inferencing multi-news datasets

Hi, thank you for your sharing. I got troubles on inferencing multinews datasets. I followed the code in Evaluation_Example.ipynb with "use_stemmers=True"to test on multinews test set but got ROUGE scores mid rouge1 fmeasure=49.87, mid rouge2 fmeasure=20.61, mid rouge-L fmeasure=25.59, which is lower than your result. Could you please tell me how to solve the problem? Thank you.

Here is my code.

# %%
from transformers import (
    AutoTokenizer,
    LEDForConditionalGeneration,
)
from datasets import load_dataset
import torch

# %%
dataset=load_dataset('multi_news')

# %%
PRIMER_path='allenai/PRIMERA-multinews'
TOKENIZER = AutoTokenizer.from_pretrained(PRIMER_path)
MODEL = LEDForConditionalGeneration.from_pretrained(PRIMER_path)
MODEL.cuda()
PAD_TOKEN_ID = TOKENIZER.pad_token_id
DOCSEP_TOKEN_ID = TOKENIZER.convert_tokens_to_ids("<doc-sep>")

# %%
def process_document(documents):
    input_ids_all=[]
    for data in documents:
        all_docs = data.split("|||||")
        for i, doc in enumerate(all_docs):
            doc = doc.replace("\n", " ")
            doc = " ".join(doc.split())
            all_docs[i] = doc

        #### concat with global attention on doc-sep
        input_ids = []
        for i, doc in enumerate(all_docs):
            input_ids.extend(
                TOKENIZER.encode(
                    doc,
                    truncation=True,
                    max_length=4096 // len(all_docs),
                )[1:-1]
            )
            if i != len(all_docs) - 1:
                input_ids.append(DOCSEP_TOKEN_ID)
        input_ids = (
            [TOKENIZER.bos_token_id]
            + input_ids
            + [TOKENIZER.eos_token_id]
        )
        input_ids_all.append(torch.tensor(input_ids))
    input_ids = torch.nn.utils.rnn.pad_sequence(
        input_ids_all, batch_first=True, padding_value=PAD_TOKEN_ID
    )
    return input_ids


def batch_process(batch):
    input_ids=process_document(batch['document']).cuda()
    # get the input ids and attention masks together
    global_attention_mask = torch.zeros_like(input_ids).to(input_ids.device).cuda()
    # put global attention on <s> token

    global_attention_mask[:, 0] = 1
    global_attention_mask[input_ids == DOCSEP_TOKEN_ID] = 1
    generated_ids = MODEL.generate(
        input_ids=input_ids,
        global_attention_mask=global_attention_mask,
        use_cache=True,
        max_length=1024,
        num_beams=5,
    )
    generated_str = TOKENIZER.batch_decode(
            generated_ids.tolist(), skip_special_tokens=True
        )
    result={}
    result['generated_summaries'] = generated_str
    result['gt_summaries']=batch['summary']
    return result

result_all = dataset['test'].map(batch_process, batched=True, batch_size=8)
with open("generated_summaries.txt", 'w') as wf1, open("gt_summaries.txt", 'w') as wf2: 
    for generated_summary in result_all['generated_summaries']:
        wf1.write(generated_summary + '\n')
    for gt_summary in result_all['gt_summaries']:
        wf2.write(gt_summary+ '\n')

from datasets import load_metric

rouge = load_metric("rouge")
with open("generated_summaries.txt") as f:
	generated_summaries = []
	for line in f:
		generated_summaries.append(line.strip())
with open("gt_summaries.txt") as f:
	gt_summaries = []
	for line in f:
		gt_summaries.append(line.strip())
result = rouge.compute(predictions=generated_summaries, references=gt_summaries, rouge_types=["rouge1", "rouge2", "rougeL", "rougeLsum"], use_stemmer=True)
print("ROUGE scores:")
print(result)

And the result is

ROUGE scores:
{'rouge1': AggregateScore(low=Score(precision=0.5241600177789812, recall=0.49415454039406814, fmeasure=0.49612791043579474), mid=Score(precision=0.5276629476118126, recall=0.4973478489591434, fmeasure=0.4987075089885308), high=Score(precision=0.5310748692379784, recall=0.5002904986992729, fmeasure=0.5012372252869416)), 'rouge2': AggregateScore(low=Score(precision=0.2153434014630018, recall=0.20143954489550434, fmeasure=0.2030364177857921), mid=Score(precision=0.21863796353503595, recall=0.2046911123876728, fmeasure=0.20606699168573342), high=Score(precision=0.22209823319305308, recall=0.20779096705177613, fmeasure=0.2091556653585579)), 'rougeL': AggregateScore(low=Score(precision=0.2681362240477783, recall=0.25188841094800846, fmeasure=0.25310320754179366), mid=Score(precision=0.2713920792728549, recall=0.25498739732693815, fmeasure=0.2559033080441543), high=Score(precision=0.27440427911451565, recall=0.25785737610997284, fmeasure=0.2585799444804941)), 'rougeLsum': AggregateScore(low=Score(precision=0.2680113119171556, recall=0.2518150046006951, fmeasure=0.25266660590838774), mid=Score(precision=0.2713474291715047, recall=0.25480955392265, fmeasure=0.2558016718700355), high=Score(precision=0.27479809392676074, recall=0.25794410881227786, fmeasure=0.25880025528901673))}

Questions about inferencing

Hi, thank you for your sharing. I have trouble in using PRIMERA to generate summary. Could you please help me using the pretrained PRIMERA model generate the summary correctly? The code is as following:

import torch
from transformers import AutoTokenizer
from longformer import LongformerEncoderDecoderForConditionalGeneration
from longformer import LongformerEncoderDecoderConfig
import time
tokenizer = AutoTokenizer.from_pretrained('/data/users/wangyiting/primer/PRIMER-main/models/PRIMER_multinews')
config = LongformerEncoderDecoderConfig.from_pretrained('/data/users/wangyiting/primer/PRIMER-main/models/PRIMER_multinews')
model = LongformerEncoderDecoderForConditionalGeneration.from_pretrained(
            './PRIMERA_model/', config=config)


# import torch
# from longformer.longformer import Longformer, LongformerConfig
from longformer.sliding_chunks import pad_to_window_size
# from transformers import RobertaTokenizer

# SAMPLE_TEXT
start_time = time.time()
SAMPLE_TEXT = """An 11-year-old boy who survived being sucked into a flooded stormwater drain has been reunited with his rescuers in Melbourne and gifted a new bike a week after the tumultuous ordeal. Jake Gilbert was cycling with a friend in Altona Meadows last week when he rode across a submerged drain and was sucked 10 metres underneath a road. Stormwater drain ‘I love you all!’: boy sucked into stormwater drain in Melbourne praises rescuers after amazing escape. Gilbert managed to grab on to the underside of a metal grate on the other side and keep his head above water before passerby Damon Trewhella and off-duty SES member Justin Costello came to his aid. Kyle, who was also washed off his bike at the same time, had managed to avoid being sucked into the flooded stormwater drain. The SES member removed the bolts from the drain’s grate before the police officer prised the grate open – with Gilbert still desperately clinging to the underside by his fingernails. His head was just above the water before he was pulled to safety. he's getting her energy back and she's back to being a 'two-step launcher' when she goes to walk – takes two steps and launches off and takes your shoulders off – but prior to that, she'd lost all energy and she couldn't hold her own back legs up."""
input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0)  # batch of size 1

# # TVM code doesn't work on CPU. Uncomment this if `config.attention_mode = 'tvm'`
# model = model.cuda(); input_ids = input_ids.cuda()

# Attention mask values -- 0: no attention, 1: local attention, 2: global attention
attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # initialize to local attention
attention_mask[:, [1, 4, 21,]] =  2  # Set global attention based on the task. For example,
                                     # classification: the <s> token
                                     # QA: question tokens

# # padding seqlen to the nearest multiple of 512. Needed for the 'sliding_chunks' attention
input_ids, attention_mask = pad_to_window_size(
        input_ids, attention_mask, config.attention_window[0], tokenizer.pad_token_id)

max_output_len = 100
generated_ids = model.generate(input_ids=input_ids, attention_mask=attention_mask,
                                            use_cache=True, max_length=max_output_len,
                                            num_beams=1)
generated_str = tokenizer.batch_decode(generated_ids.tolist(), skip_special_tokens=True)
end_time = time.time()
print("spending: ", end_time-start_time)
print(generated_str[0])

Release outputs of MDS systems

Thanks for your great work!

Could you release the outputs of your models on each MDS dataset?

About the WCEP dataset

Hi there, thanks for releasing the code of you work!

Just wondering can you also release the WCEP-10 dataset you are using in your paper? The original WCEP (https://drive.google.com/drive/folders/1T5wDxu4ajFwEq77dG88oE95e8ppREamg?usp=sharing) has more than 10 docs in each cluster, and I just would like to confirm the exact way you obtain the WCEP-10 version. Did you just select the first 10 docs using [0:10] or you did something like random sampling?

Thanks and looking forward to your reply.

Question about the local window size

Hi, thanks for sharing your great work. I have a question about the local window size of the LED model, as far as I know, it is 1024 in the Longformer paper as well as the huggingface version, however, you mention that it is 512 in your paper, did you resize it or is there anything that I misunderstand?

Inference with PRIMERA much slower than inference with PRIMERA-multinews

Hi!

I have noticed something strange. Inference with allenai/PRIMERA is much slower (as much as 5X!) than inference with allenai/PRIMERA-multinews. I have a notebook benchmarking this effect here. I checked that their model configs are identical.

The problem is most likely related to the decoder/generation because it only occurs with the max_length argument to model.generate() is large (say 1024) and not when it is small (say 64). Here are some benchmarks using some random examples from MultiNews:

With a batch size of 4 and max length of inputs 1024:

Max length of outputs: 64

PRIMERA CUDA time total: 758.402ms
PRIMERA-multinews CUDA time total: 753.125ms
Slowdown: ~0X

Max length of outputs: 512

PRIMERA CUDA time total: 3.682s
PRIMERA-multinews CUDA time total: 1.572s
Slowdown: ~2X

Max length of outputs: 1024

PRIMERA CUDA time total: 7.676s
PRIMERA-multinews CUDA time total: 1.542s
Slowdown: ~5X

Do you have any idea what might be causing this?

EDIT: While I don't know why this is partially explained by the fact that allenai/PRIMERA does not appear to use the global_attention_mask. Running the model with and without providing global_attention_mask leads to the same inference times, while with allenai/PRIMERA-multinews, providing global_attention_mask leads to a 30% reduction in inference time. However, allenai/PRIMERA-multinews without a global_attention_mask is still almost 3X faster than allenai/PRIMERA, so this couldn't entirely explain the difference.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.