shuyangcao / cliff_summ Goto Github PK

Code for EMNLP 2021 paper "CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization"

License: Apache License 2.0

Python 97.47% Shell 2.53%

cliff_summ's Introduction

CLIFF

Code for EMNLP 2021 paper "CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization"

News

Codes for using unlikelihood training and in-batch negatives are added. Please check train_xsum_batch_neg.sh and train_xsum_single_neg_ull.sh. Related Fairseq codes are here: unlikelihood_translation.py and contrastive_translation_batch_neg.py.
A cleaner implementation is available. The new implementation uses less system RAM and is compatible with the current version of Fairseq. Check here.
We find that the newer version of QuestEval produces much lower scores than the version (commit 0e94a74) we used in our paper. Please do not directly take the QuestEval results from the paper if you are using the newer version.

Data Construction

For data construction, please refer to data_construction. Constructed datasets are also available in Google Drive.

Training

The following scripts require that your $DATA folder is organized the same as the data folder in Google Drive.

BART

Our experiments with BART use Fairseq at commit 0db28cd. Newer versions might also work. Please download the pre-trained BART model here and set BART_PATH to the downloaded model:

export BART_PATH=/path/to/bart/model.pt

Single Negative Strategy

The following command trains the models with negative samples constructed by SysLowCon. It saves the trained models in $TRAINED_MODELS/xsum/syslowcon and $TRAINED_MODELS/cnndm/syslowcon. Please change $DATA/xsum_synthetic/negative_syslowcon to other negative samples to train the corresponding models.

# XSum
cd scripts/bart
CUDA_VISIBLE_DEVICES=0,1 ./train_xsum_single_neg.sh \
  $DATA/xsum_synthetic/negative_syslowcon $TRAINED_MODELS/bart_xsum/syslowcon

# CNN/DM
cd scripts/bart
CUDA_VISIBLE_DEVICES=0,1 ./train_cnndm_single_neg.sh \
  $DATA/cnndm_synthetic/negative_syslowcon $TRAINED_MODELS/bart_cnndm/syslowcon

Multiple Negative Strategies

The following command trains the models with negative samples constructed by SysLowCon and SwapEnt. It saves the trained models in $TRAINED_MODELS/xsum/syslowcon_swapent and $TRAINED_MODELS/cnndm/syslowcon_swapent.

# XSum
cd scripts/bart
CUDA_VISIBLE_DEVICES=0,1 ./train_xsum_mutli_neg.sh \
  "$DATA/xsum_synthetic/negative_syslowcon $DATA/xsum_synthetic/negative_swapent" \
  $TRAINED_MODELS/bart_xsum/syslowcon_swapent

# CNN/DM
cd scripts/bart
CUDA_VISIBLE_DEVICES=0,1 ./train_cnndm_multi_neg.sh \
  "$DATA/cnndm_synthetic/negative_syslowcon $DATA/cnndm_synthetic/negative_swapent" \
  $TRAINED_MODELS/bart_cnndm/syslowcon_swapent

Pegasus

Our experiments with Pegasus use Huggingface Transformers 4.5.1. Newer versions might also work.

Single Negative Strategy

# XSum
cd scripts/pegasus
CUDA_VISIBLE_DEVICES=0,1 ./train_xsum_single_neg.sh \
  $DATA/xsum_synthetic/negative_syslowcon $TRAINED_MODELS/pegasus_xsum/syslowcon
  
# CNN/DM
cd scripts/pegasus
CUDA_VISIBLE_DEVICES=0,1 ./train_cnndm_single_neg.sh \
  $DATA/cnndm_synthetic/negative_syslowcon $TRAINED_MODELS/pegasus_cnndm/syslowcon

Decoding

The following examples show how to decode trained models. Model checkpoints are available in Google Drive.

BART

# XSum
cd scripts/bart
./decode_xsum.sh $TRAINED_MODELS/bart_xsum/syslowcon/checkpoint_last.pt /path/to/save/dir

# CNN/DM
cd scripts/bart
./decode_cnndm.sh $TRAINED_MODELS/bart_cnndm/syslowcon/checkpoint_last.pt /path/to/save/dir

Pegasus

# XSum
cd scripts/pegasus
python run_generation.py $DATA/xsum_raw/test.source $TRAINED_MODELS/pegasus_xsum/syslowcon /path/to/save/dir

# CNN/DM
cd scripts/pegasus
python run_generation.py $DATA/cnndm_raw/test.source $TRAINED_MODELS/pegasus_cnndm/syslowcon /path/to/save/dir

cliff_summ's People

Contributors

Stargazers

Watchers

Forkers

launchnlp bobtuan tiexueyl usstasikus zhangming880102 sapdo nlpfreshbird alceballosa

cliff_summ's Issues

Question about paper

Thanks for your amazing work firstly.
But I have some questions about the proposed method:

The paper demonstrates that CLIFF’s improvements over the cross entropy baseline are more consistent compared with Unlikelihood method. But performance drop in ROUGE-L still occurs when using contrastive loss and why is that?
The paper also says that the key advantage of CLIFF resides in its measure of representation similarities between positive and negative samples in the same batch. Is this referring to the contrastive loss formulation (1)?

NameError: name 'old' is not defined in contrastive_translation_multi_neg.py

I ran train_xsum_multi_neg.sh and got the following error. It happens with both old and new implementations based on BART backbone.

Traceback (most recent call last):
  File "/home/chshen/dev/pyvenv/lib64/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/chshen/dev/pyvenv/lib64/python3.8/site-packages/fairseq/distributed_utils.py", line 300, in distributed_main
    main(cfg, **kwargs)
  File "/home/chshen/dev/pyvenv/lib64/python3.8/site-packages/fairseq_cli/train.py", line 69, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=1)
  File "/home/chshen/dev/projects/src/ml/cliff_summ/models/bart/contrastive_translation_multi_neg.py", line 353, in load_dataset
    self.datasets[split] = load_langpair_dataset(
  File "/home/chshen/dev/projects/src/ml/cliff_summ/models/bart/contrastive_translation_multi_neg.py", line 127, in load_langpair_dataset
    max_source_positions - (2 if old else 1),
NameError: name 'old' is not defined

Empty files when decoding

When using the decode_cnn.sh script (on my own dataset), the script finishes running, but the output files (bpe-test.txt and formatted-test.txt) are both empty. When I check the log file, I get this error:

Traceback (most recent call last):
  File "/home/lewis.2799/miniconda3/envs/QP2/bin/fairseq-generate", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-generate')())
  File "/home/lewis.2799/fairseq/fairseq_cli/generate.py", line 392, in cli_main
    main(args)
  File "/home/lewis.2799/fairseq/fairseq_cli/generate.py", line 48, in main
    return _main(cfg, h)
  File "/home/lewis.2799/fairseq/fairseq_cli/generate.py", line 97, in _main
    models, saved_cfg = checkpoint_utils.load_model_ensemble(
  File "/home/lewis.2799/fairseq/fairseq/checkpoint_utils.py", line 262, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(
  File "/home/lewis.2799/fairseq/fairseq/checkpoint_utils.py", line 320, in load_model_ensemble_and_task
    model.load_state_dict(state["model"], strict=strict, model_cfg=cfg.model)
  File "/home/lewis.2799/fairseq/fairseq/models/fairseq_model.py", line 113, in load_state_dict
    self.upgrade_state_dict(state_dict)
  File "/home/lewis.2799/fairseq/fairseq/models/fairseq_model.py", line 119, in upgrade_state_dict
    self.upgrade_state_dict_named(state_dict, "")
  File "/home/lewis.2799/fairseq/fairseq/models/bart/model.py", line 185, in upgrade_state_dict_named
    num_classes = state_dict[
KeyError: 'classification_heads.contrast.out_proj.weight'

This appears to be related to the error described in this issue (not the initial post but a few comments down), but the author said that this error shouldn't be a problem.

I also printed the keys of the state_dict dictionary and several that are similar to the error key, though somewhat different:

classification_heads.contrast.dense.weight
classification_heads.contrast.dense.bias
classification_heads.contrast.out_proj.bias
classification_heads.contrast.out_proj.weight_orig
classification_heads.contrast.out_proj.weight_u
classification_heads.contrast.out_proj.weight_v

Wondering if there are any insights to why the generated files are empty? Thanks!!!

Attribute Error

I keep running into this error and can't figure out what is going wrong. I'm attempting to run the train_cnndm_single_neg.sh script on the given data. I have not made any changes to the batch script except to add my directories, but I'll paste it here just in case:

TOTAL_NUM_UPDATES=20000
WARMUP_UPDATES=500
LR=3e-05
MAX_TOKENS=1024
UPDATE_FREQ=32
NEG_DIR=$DATA/cnndm_synthetic/negative_syslowcon
SAVE_PATH=$DATA/2
POS_DIR=$DATA/cnndm_synthetic/positive_bt_filter
DATA_DIR=$DATA/cnndm_binarized
USER_DIR=../../models/bart

fairseq-train $DATA_DIR --pos-data $POS_DIR --neg-data $NEG_DIR --max-neg-samples 4 \
    --restore-file $BART_PATH --save-dir $SAVE_PATH \
    --max-tokens $MAX_TOKENS \
    --task contrastive_translation --mlp 1024 \
    --source-lang source --target-lang target \
    --truncate-source \
    --layernorm-embedding \
    --share-all-embeddings \
    --share-decoder-input-output-embed \
    --reset-optimizer --reset-dataloader --reset-meters \
    --required-batch-size-multiple 1 \
    --arch contrastive_bart_large \
    --criterion contrastive_loss \
    --label-smoothing 0.1 \
    --fixed-validation-seed 7 \
    --spectral-norm-classification-head \
    --dropout 0.1 --attention-dropout 0.1 \
    --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \
    --clip-norm 0.1 \
    --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \
    --fp16 --update-freq $UPDATE_FREQ \
    --skip-invalid-size-inputs-valid-test --max-epoch 5 \
    --no-save-optimizer-state --no-epoch-checkpoints \
    --find-unused-parameters \
    --user-dir $USER_DIR;

Here is the error I keep getting:

  File "/home/lewis.2/fairseq/cliff_summ/models/bart/constrative_bart.py", line 49, in forward
    eos: int = self.eos
  File "/home/lewis.2799/miniconda3/envs/QP2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ContrastiveBARTModel' object has no attribute 'eos'

It appears that something is going wrong in the constrative_bart.py script with "eos" but I'm struggling to trace where that is coming from and what it's doing. Any information would be really appreciated!

CUDA out of memory when training pegasus with constructed data even if the batch size is set to be 1

My GPUs are two NVIDIA GeForce RTX 3080 with memory 10G each. When training the pegasus model, the dataset can be loaded, but after that CUDA is out of memory even if the batch size is set to be 1. Is this because the GPU memory is not large enough?
By the way, there's a warning after the datasets are loaded:

Some weights of PegasusForContrastive were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['classification_head.dense.weight', 'classification_head.dense.bias', 'classification_head.out_proj.weight', 'classification_head.out_proj.bias']
Is that normal or abnormal?

How can I solve the problems?

The main error message is shown below:

Dataset Loaded.
Dataset Loaded.
Some weights of PegasusForContrastive were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['classification_head.dense.weight', 'classification_head.dense.bias', 'classification_head.out_proj.weight', 'classification_head.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of PegasusForContrastive were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['classification_head.dense.weight', 'classification_head.dense.bias', 'classification_head.out_proj.weight', 'classification_head.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
0%| | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
File "contrastive_train.py", line 55, in
main()
File "contrastive_train.py", line 51, in main
trainer.train()
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/trainer.py", line 1120, in train
tr_loss += self.training_step(model, inputs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/trainer.py", line 1524, in training_step
loss = self.compute_loss(model, inputs)
File "/home/hqh/Desktop/cliff_summ-main/models/pegasus/contrastive_trainer.py", line 84, in compute_loss
loss, _ = self._compute_loss(model, inputs)
File "/home/hqh/Desktop/cliff_summ-main/models/pegasus/contrastive_trainer.py", line 51, in _compute_loss
model_output = model(**inputs, use_cache=False)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/fairscale/nn/data_parallel/sharded_ddp.py", line 230, in forward
return self.module(*inputs, **kwargs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hqh/Desktop/cliff_summ-main/models/pegasus/contrastive_model.py", line 174, in forward
return_dict=return_dict,
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 1163, in forward
return_dict=return_dict,
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 1024, in forward
use_cache=use_cache,
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 443, in forward
output_attentions=output_attentions,
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 199, in forward
value_states = self._shape(self.v_proj(key_value_states), -1, bsz)
File "/usr/local/anaconda3/envs/cliff/lib/python3.7/site-packages/transformers/models/pegasus/modeling_pegasus.py", line 171, in _shape
return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 1; 9.78 GiB total capacity; 7.07 GiB already allocated; 24.81 MiB free; 7.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

Trying to use my own positive and negative examples

I'm sorry this is a pretty vague question, but I was hoping for some pointers in the right direction. I'm trying to run your code but using my own data and my own positive and negative examples that I obtained using methods that made more sense for my particular dataset. I have 1 positive example and 3 negative examples per data point. I'm trying to basically skip over the example creation methods and simply get my current data into the correct format. However, after binarizing the data and attempting to run the train_cnndm_multi_neg.sh script, I find that I'm missing at least the valid.neg_target file. I took a look at that file and wasn't sure exactly what it was and am also struggling to figure out how it's getting created.

Any pointers for how to get my data in the correct format would be so appreciated! But of course I understand if it's too vague of a question.

Issues with Regeneration transformation

Hello, while running the code on a different dataset as those described in the paper I encountered errors when filtering regenerated examples.

python filter_generated.py --generated-docbins $DATA/processed_data/xsum_regeneration_output/train_generated.doc \
 --source-docbins $DATA/processed_data/xsum_stanza_docbin/train.source \
 --target-docbins $DATA/processed_data/xsum_stanza_docbin/train.target \
 --other $DATA/processed_data/xsum_regeneration_output/train_generated.other \
 $DATA/processed_data/xsum_regeneration_output/train_filtered.jsonl
python filter_generated.py --generated-docbins $DATA/processed_data/xsum_regeneration_output/valid_generated.doc \
 --source-docbins $DATA/processed_data/xsum_stanza_docbin/valid.source \
 --target-docbins $DATA/processed_data/xsum_stanza_docbin/valid.target \
 --other $DATA/processed_data/xsum_regeneration_output/valid_generated.other \
 $DATA/processed_data/xsum_regeneration_output/valid_filtered.jsonl

The following code throws an error:

File "filter_generated.py", line 358, in main
    id, _, gen_type, num = line.strip().split(' ')
ValueError: not enough values to unpack

Testing with Cleaner Version

Hi, thank you for this interesting and inspiring work!
I tried to replicate your implementation by referring to the cleaner version of implementation. It can train the model successfully and obtain checkpoint.py file. However, in the documentation for cleaner version it did not describe preprocessing step for test data so I’m not very sure how to fairseq-generate after getting the checkpoint.pt file. May I get your advice on that? Thank you very much!:)

Can you provide the output of your models?

Thanks!

Unlikelihood Trainning code

I'm looking for using fairseq to train an unlikelihood baseline. Is the unlikelihood part trained with fairseq?

Question about UPDATE_FREQ

TOTAL_NUM_UPDATES=20000
WARMUP_UPDATES=500
LR=3e-05
MAX_TOKENS=1024
UPDATE_FREQ=32

The ReadMe from fairseq.example.bart.summarization said that the UPDATE_FREQ is 4 when you have 8 gpu.
However the code from cliff_summ said its UPDATE_FREQ=32 but it trained with 2 gpu.
Could you tall me about why the parameter isnot UPDATE_FREQ=16 with 2gpu? Is it about the CL？
Thanks .

multi-card training crush on custom dataset

I gathered negative data and positive data on gigaword.
The training crushed when I set 2 cards. Here is log:

Traceback (most recent call last):
  File "/ext3/miniconda3/envs/syslowcon/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/scratch/tw2112/codes/fairseq/fairseq_cli/train.py", line 392, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/scratch/tw2112/codes/fairseq/fairseq/distributed_utils.py", line 313, in call_main
    torch.multiprocessing.spawn(
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGKILL

From this I can't get any useful information
I know use spawn to start DDP will lose some error message, so I switch to 1 card training. But this time the training runs well, only print warning I mentioned in #4 .

Same problem also happens on xsum datasets downloaded from the google drive

requirement.txt files missing

Please add requirements.txt files to subdirectories in the project to make it easier to run the code.

Attribute error in training

Hi, when trying to run the training I'm continuously receiving the Attribute error message 'PegasusForContrastive' object has no attribute 'model'. Would you mind looking into the problem?

argument conflict when training

Recently I'm working on applying this work on gigaword.

When I gathered the data then started training, the fairseq printed error log:

Traceback (most recent call last):
  File "/ext3/miniconda3/envs/syslowcon/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/home/tw2112/codes/cliff/fairseq/fairseq_cli/train.py", line 513, in cli_main
    args = options.parse_args_and_arch(parser, modify_parser=modify_parser)
  File "/home/tw2112/codes/cliff/fairseq/fairseq/options.py", line 158, in parse_args_and_arch
    TASK_REGISTRY[args.task].add_args(parser)
  File "/home/tw2112/codes/cliffnew/models/bart/contrastive_translation.py", line 249, in add_args
    parser.add_argument('--max-source-positions', default=1024, type=int, metavar='N',
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/argparse.py", line 1386, in add_argument
    return self._add_action(action)
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/argparse.py", line 1749, in _add_action
    self._optionals._add_action(action)
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/argparse.py", line 1590, in _add_action
    action = super(_ArgumentGroup, self)._add_action(action)
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/argparse.py", line 1400, in _add_action
    self._check_conflict(action)
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/argparse.py", line 1539, in _check_conflict
    conflict_handler(action, confl_optionals)
  File "/ext3/miniconda3/envs/syslowcon/lib/python3.8/argparse.py", line 1548, in _handle_conflict_error
    raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument --max-source-positions: conflicting option string: --max-source-positions

I checked the contrastive_translation.py and there is a parameter alse named "--max-source-positions".
Anyone could tell me the correct way to run the training process?

Could I gather negative data on subset of training set then concatenate them?

Here is the thing:
Running neg_syslowcon on gigaword takes 30 hours on an RTX8000 card. My task was OOM then killed, the last log shows that it finished 2201 out of 2719 batches. I don't really know the output format of fairseq-generate, the current generate-train.txt contains 62272000 lines.. I don't know whether the output file contains some index or just uses the line number for alignments. So...just don't want to spend one day more to run gathering syslowcon's data. After all, it already finished 73%. What should I do after the script was killed? How to calculate the number of finished samples already in generate-train.txt ? Could I run the code for the remaining part then merge 2 files?

Issues with Mask Filling

I encountered the following errors when trying to run the lm_mask_fill.py script.

  File "../../data_construction/neg_mask_regen/lm_mask_fill.py", line 175, in <module>
    main()
  File "../../data_construction/neg_mask_regen/lm_mask_fill.py", line 164, in main
    new_summary = bart.fill_mask(inputs, topk=args.topk, beam=5, match_source_len=False)
  File "/export/home/cliff_summ/data/fairseqs/fairseq/models/bart/hub_interface.py", line 196, in fill_mask
    batch_hypos = self.generate(batch_tokens, **generate_kwargs)
  File "/export/home/cliff_summ/data/fairseqs/fairseq/models/bart/hub_interface.py", line 105, in generate
    return super().generate(
  File "/export/home/cliff_summ/data/fairseqs/fairseq/hub_utils.py", line 171, in generate
    translations = self.task.inference_step(
  File "/export/home/cliff_summ/data/fairseqs/fairseq/tasks/fairseq_task.py", line 450, in inference_step
    return generator.generate(
  File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/export/home/cliff_summ/data/fairseqs/fairseq/sequence_generator.py", line 176, in generate
    return self._generate(sample, **kwargs)
  File "/export/home/cliff_summ/data/fairseqs/fairseq/sequence_generator.py", line 341, in _generate
    lprobs, tokens, scores = self._prefix_tokens(
  File "/export/home/cliff_summ/data/fairseqs/fairseq/sequence_generator.py", line 547, in _prefix_tokens
    prefix_lprobs = lprobs.gather(-1, prefix_toks.unsqueeze(-1))
RuntimeError: Size does not match at dimension 0 expected index [120, 1] to be smaller than src [100, 50265] apart from dimension 1```

shuyangcao / cliff_summ Goto Github PK

cliff_summ's Introduction

CLIFF

News

Data Construction

Training

BART

Single Negative Strategy

Multiple Negative Strategies

Pegasus

Single Negative Strategy

Decoding

BART

Pegasus

cliff_summ's People

Contributors

Stargazers

Watchers

Forkers

cliff_summ's Issues

Recommend Projects

Recommend Topics

Recommend Org