Giter Site home page Giter Site logo

salesforce / ctrl-sum Goto Github PK

View Code? Open in Web Editor NEW
145.0 8.0 24.0 1.83 MB

Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper

Home Page: https://arxiv.org/abs/2012.04281

License: BSD 3-Clause "New" or "Revised" License

Python 91.93% Shell 8.07%
text-summarization text-generation

ctrl-sum's Introduction

CTRLsum

This is PyTorch implementation of the paper:

CTRLsum: Towards Generic Controllable Text Summarization
Junxian He, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong
arXiv 2020

This repo includes instructions for using pretrained CTRLsum models as well as training new models.

CTRLsum is a generic controllable summarization system to manipulate text summaries given control tokens in the form of keywords or prefix. CTRLsum is also able to achieve strong (e.g. state-of-the-art on CNN/Dailymail) summarization performance in an uncontrolled setting.

🎥 Demo1: Hugging Face Spaces(to interactively generate using the pretrained model)

🎥 Demo2(to navigate the CTRLsum outputs used in our experiments)

Model checkpoints

Dataset Dowload
CNN/DailyMail download (.tar.gz)
arXiv download (.tar.gz)
BIGPATENT download (.tar.gz)

These checkpoints are also available in huggingface transformers, see details below.

Updates

April 09, 2022

@aliencaocao made a repo here on converting our pretrained taggers into ONNX to make it much faster to load and run inference.

October 07, 2021

Integrated to Huggingface Spaces with Gradio. See demo: Hugging Face Spaces

June 18, 2021

We released another Web UI Demo (here) to navigate most of CTRLsum outputs generated in the experiments of the paper.

Mar 22, 2021

Hyunwoong Ko made a python package, summarizers, based on CTRLsum. CTRLsum is also now supported in huggingface transformers credited to Hyunwoong Ko. Currently CTRLsum can be easily used with several lines of codes with these packages. See an example using huggingface transformers.

Dependencies

The code requires Python 3, PyTorch (>=1.4.0), and fairseq (the code is tested on this commit)

Install dependencies:

# manually install fairseq
git clone https://github.com/pytorch/fairseq

# this repo is tested on a commit of fairseq from May 2020:
# fad3cf0769843e767155f4d0af18a61b9a804f59
cd fairseq
git reset --hard fad3cf07

# the BART interface in fairseq does not support prefix-constrained decoding
# as of creating this README, thus we need to make several modifications to 
# fairseq before installing it
cp ../ctrlsum/fairseq_task.py fairseq/tasks/fairseq_task.py
cp ../ctrlsum/sequence_generator.py fairseq/
cp ../ctrlsum/hub_interface.py fairseq/models/bart/

# install fairseq
pip install --editable ./

cd ..

# install other requirements
pip install -r requirements.txt

Example Usage of Pretrained Models

Option 1. Generate summaries in an interactive way, users can specify the control tokens (keywords, prompts, or both):

CUDA_VISIBLE_DEVICES=xx python scripts/generate_bart_interactive.py --exp [checkpoint directory] \
	--dataset example_dataset \
	--src test.oraclewordnssource

The command above reads source articles from datasets/example_dataset/test.oraclewordnssource, users can then interact with the system in the commandline by inputting the id of examples to be shown, as well as the control tokens:

ctrlsum

Option 2. Generate summaries from a file which includes keywords:

# the following command generates summaries from `datasets/example_dataset/test.oraclewordnssource`
# the input data format is concatenated keywords and source with sep token, please refer to the 
# given example data files for examples
# the predicted summaries are saved into the checkpoint directory
CUDA_VISIBLE_DEVICES=xx python scripts/generate_bart.py --exp [checkpoint directory] \
	--dataset example_dataset \
	--src test.oraclewordnssource 

Option 3. Through Huggingface Transformers

Our pretrained model checkpoints are available in huggingface transformers, the model names are: hyunwoongko/ctrlsum-cnndm, hyunwoongko/ctrlsum-arxiv, and hyunwoongko/ctrlsum-bigpatent. An example code snippet (quoted from here):

1. Create models and tokenizers

>> from transformers import AutoModelForSeq2SeqLM, PreTrainedTokenizerFast

>>> model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-cnndm")
>>> # model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-arxiv")
>>> # model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-bigpatent")

>>> tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-cnndm")
>>> # tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-arxiv")
>>> # tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-bigpatent")

2. Unconditioned summarization

>>> data = tokenizer("My name is Kevin. I love dogs. I loved dogs from 1996. Today, I'm going to walk on street with my dogs", return_tensors="pt")
>>> input_ids, attention_mask = data["input_ids"], data["attention_mask"]
>>> tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5))[0]
'</s>My name is Kevin. I loved dogs from 1996.</s>'

3. Conditioned summarization

  • You can input condition token using TOKEN => CONTENTS structure
>>> data = tokenizer("today plan => My name is Kevin. I love dogs. I loved dogs from 1996. Today, I'm going to walk on street with my dogs", return_tensors="pt")
>>> input_ids, attention_mask = data["input_ids"], data["attention_mask"]
>>> tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5))[0]
"</s> Today, I'm going to walk on street with my dogs. I loved dogs from 1996</s>"

4. Prompt summarization

  • You can also input decoder_input_ids for input prompt.
>>> data = tokenizer("Q:What is my name? A: => My name is Kevin. I love dogs. I loved dogs from 1996. Today, I'm going to walk on street with my dogs", return_tensors="pt")
>>> input_ids, attention_mask = data["input_ids"], data["attention_mask"]
>>> tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5, decoder_input_ids=tokenizer("Q:What is My name? A:", return_tensors="pt")["input_ids"][:, :-1]))[0]
'<s>Q:What is My name? A: Kevin.</s>'

Option 4. Through the Summarizers Python Package

The python package summarizers allows you to use the pretrained CTRLsum with several lines of code.

Train CTRLsum

Data Processing

Prepare your data files into datasets/[dataset name], which should consist of six data files as [train/val/test].[source/target]. These data files are raw text with each row representing one example. We take cnndm dataset as an example to preprocess the dataset (see here for instructions to obtain the cnndm dataset):

# this command runs the preprocessing pipeline including tokenization, truncation, and 
# keywords extraction. It will generate all required data files to train CTRLsum into 
# `datasets/cnndm`. Example obtained files can be found in `datasets/example_dataset`
# Some optional arguments can be found in preprocess.py
python scripts/preprocess.py cnndm --mode pipeline

# gpt2 encoding
bash scripts/gpt2_encode.sh cnndm

# binarize dataset for fairseq
bash scripts/binarize_dataset.sh cnndm

For the generated files in the datasets/cnndm, the suffix oracleword represents the keywords (after keyword dropout) file, oraclewordsource represents the concatenated keywords and source. oraclewordns represents the original keywords without keyword dropout. The .jsonl files are potentially used to train the tagger later.

Train the summarization model on multiple GPUs:

bash scripts/train_bart.sh -g [GPUs] -d [dataset name] -b [bart checkpoint path (.pt file)]

GPUs are GPU ids separated by ,. All our experiments are on 8 GPUs accumulating 8 gradient steps, resulting in an effective batch size of 1024x8x8 tokens in total. You propably need to increase the update_freq variable in train_bart.sh if you use less GPUs to match the effective batch size. The saved models are in dir checkpoint. The training arguments can be found in train_bart.sh.

Train the keyword tagger (optional):

Note that the keyword tagger is required only in uncontrolled summarization setting and certain control settings which require automatic keywords (like length control in the paper)

# this requires to give 4 gpus for training by default,
# you need to change the --nproc_per_node value if you 
# train with different number of gpus
bash scripts/train_seqlabel.sh -g [GPUs] -d [dataset name]

The effective batch size we used for different datasets can be found in the training script as number of gpus x batch x uddate_freq

Evaluate CTRLsum

Here we include evaluation for uncontrolled summarization settings.

Obtain automatic keywords from a trained tagger:

# run prediction from the tagger which outputs confidence values for every token
# `checkpoint directory` is the directory that contains the `pytorch_model.bin` checkpoint.
# the results are saved in the checkpoint directory as test_predictions.txt
bash scripts/train_seqlabel.sh -g [GPUs] -d [dataset name] -p [checkpoint directory]


# obtain keywords by selecting confident words, `threshold, maximum-word, and summary-size` 
# are three hyperparameters in this step, please check Appendix A in the paper for specific
# values we used for different datasets, the performance is relatively robust
# this command will yield a file `.predwordsource` in `datasets/[dataset name]` which can be
# used as input to the summarization model to obtain uncontrolled summaries
python scripts/preprocess.py [dataset name] \
		--split test \
		--mode process_tagger_prediction \
		--tag-pred [the tagger prediction file path, named as test_predictions.txt] \
		--threshold [confidence threshold] \
		--maximum-word [maximum number of keywords] \
		--summary-size [number of sentences from which to identify keywords]

Metrics:

We report ROUGE scores and BERTScore in the paper. The ROUGE scores in the paper are computed using files2rouge which is a wrapper of a wrapper of the original ROUGE perl scripts. Please refer to scripts/test_bart.sh for our evaluation script:

# you will need the Stanford corenlp java toolkit to run this, we use it for tokenization
# this script computes ROUGE and (optionally) BERTScore.
bash scripts/test_bart.sh -g [GPUs] -s [source file name, NOT full path] -d [dataset] -p [ctrlsum checkpoint directory]

Citation

@article{he2020ctrlsum,
title={{\{}CTRL{\}}sum: Towards Generic Controllable Text Summarization},
author={He, Junxian and Kry{\'s}ci{\'n}ski, Wojciech and McCann, Bryan and Rajani, Nazneen and Xiong, Caiming},
journal={arXiv},
year={2020}
}

ctrl-sum's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctrl-sum's Issues

fairseq-train: error: argument --restore-file: expected one argument

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Hello, I try to train by running "bash scripts/train_bart.sh -g 0,1,2,3 -d cnndm"
But there is an error: error: argument --restore-file: expected one argument
I think I should reload the original Bart Model.
Where I can download the model and How Can I reload the Model?
Thanks~

What have you tried?

What's your environment?

  • fairseq Version (e.g., 1.0 or master): 0.9.0
  • PyTorch Version (e.g., 1.0): 1.7.1
  • OS (e.g., Linux):
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version: 3.7
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

package for CTRLsum (easy to use)

Thanks for cool research. The results of this research were quite appealing to me, so I made the package to make it easier to use. The package is called summarizers. I ported the model you provided to Hugging Face Transformers, and I made it for use with codes 2 to 3 lines. If anyone wants to try this model more easily, it would be good to use.

https://github.com/hyunwoongko/summarizers

Thanks for great research again.

hello! could you provide the contributions test data in your paper? thank you very much!

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Code

What have you tried?

What's your environment?

  • fairseq Version (e.g., 1.0 or master):
  • PyTorch Version (e.g., 1.0)
  • OS (e.g., Linux):
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Oracle entity in Table 2 VS. Oracle keywords in Table 7

I am trying to reproduce ROUGE on CNNDM with 'oracle keyword in Table 7'. 'oracle entity setting in Table 2' sounds similar to 'oracle keyword in Table 7', however, ROUGE score is very different. Could you explain how these settings are different?

image

Summary length truncated even set min_length explicitly

❓ Questions and Help

What is your question?

I tried to use the CTRLSum model to generate summaries. However, the generated summaries are always truncated, even I explicitly set the min_length in the generate function .

Code

from transformers import AutoModelForSeq2SeqLM, PreTrainedTokenizerFast
model = AutoModelForSeq2SeqLM.from_pretrained("hyunwoongko/ctrlsum-cnndm")
tokenizer = PreTrainedTokenizerFast.from_pretrained("hyunwoongko/ctrlsum-cnndm")
inp_doc = r"Relief efforts in Nepal are intensifying after more than 2,300 people were killed in the worst earthquake there in more than 80 years. Rescue missions and aid material have started arriving in the country."

data = tokenizer(inp_doc, return_tensors="pt")
input_ids, attention_mask = data["input_ids"], data["attention_mask"]
decoded = model.generate(input_ids, attention_mask=attention_mask, num_beams=5,min_length=100)
print(decoded)

output: tensor([[ 2, 901, 87, 132, 6, 2965, 82, 58, 848, 11, 5, 2373,
8969, 11, 55, 87, 1812, 107, 4, 5]])

What have you tried?

I took a look at the config.json of the model card There was no hard-coded limit as far as I can tell.
I also looked at generate function . early_stop is by default False.

import transformers
transformers.version
'4.3.3'

What is the input sequence size ?

First of all thanks for this nice summary model. I like to what is input sequence size for this model ? and what is best procedure if I like to summary a long document or multi documents with length more than 6k tokens Thanks in advance.

Use only selected sentences in source document

Thanks for sharing interesting works & source code.

In section 2.2, greedily selected sentences from a document highly correlated with reference summary. While other sentences are expected to have a low correlation with reference summary. Selected sentences exist for both training & inference.

I wonder what is the expected pros/cons when using 'keywords + selected sentences' as input of the BART encoder instead of 'keywords + all sentences'.
Do you have any ablation study results on this?

Error: Could not find or load main class edu.stanford.nlp.process.PTBTokenizer

I am trying to run the test_bart.sh code to get the rouge score for my own dataset and I get the following:

`Traceback (most recent call last):
File "/home/mdabed/anaconda3/envs/myPython3.7/bin/files2rouge", line 33, in
sys.exit(load_entry_point('files2rouge==2.1.0', 'console_scripts', 'files2rouge')())
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 141, in main
stemming=not args.no_stemming)
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 65, in run
ignore_empty_summary=ignore_empty_summary)
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/utils.py", line 40, in split_files
model_count = line_count(model_path)
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/utils.py", line 19, in line_count
n = i + 1Traceback (most recent call last):
File "/home/mdabed/anaconda3/envs/myPython3.7/bin/files2rouge", line 33, in
sys.exit(load_entry_point('files2rouge==2.1.0', 'console_scripts', 'files2rouge')())
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 141, in main
stemming=not args.no_stemming)
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 65, in run
ignore_empty_summary=ignore_empty_summary)
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/utils.py", line 40, in split_files
model_count = line_count(model_path)
File "/home/mdabed/anaconda3/envs/myPython3.7/lib/python3.7/site-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/utils.py", line 19, in line_count
n = i + 1
UnboundLocalError: local variable 'i' referenced before assignment

UnboundLocalError: local variable 'i' referenced before assignment
`

What have you tried?

I tried adding the classpath variables for the core NLP as most stack overflow tells me to do. Also, I have tried checking the tokenization command manually.

Preprocess stuck

🐛 Bug

On executing python scripts/preprocess.py cnndm --mode pipeline
Preprocessing stuck at this point:

image

some of the oraclewords are not generated too.

image

Environment

  • fairseq Version (e.g., 1.0 or master): recommended commit
  • PyTorch Version (e.g., 1.0) : 1.8
  • OS (e.g., Linux): Linux
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): source
  • Python version: 3.6.8
  • CUDA/cuDNN version: 10.2

'Confidence score' for a generated summary?

❓ Questions and Help

Hi, is there something like a confidence score or a metric that I can use to evaluate how confident the model is in generating the summary? Specifically, I used it to perform question-answering, and some questions may not have an answer in the given text. Is there a way to kind of 'detect' when there is no obvious enough answer in the source text? E.g. when the model makes a less-confident prediciton. Thanks.

Question about the baselines used in the paper

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Hi, thanks for sharing your great work! I have some question regarding the baseline result in the paper, specifically in Table 6, is the BART here finetuned on the arXiv dataset like your CTRLsum? Also, is the result in this table all zero-shot (you did not train on these two specific tasks)? Thanks.

Code

What have you tried?

What's your environment?

  • fairseq Version (e.g., 1.0 or master):
  • PyTorch Version (e.g., 1.0)
  • OS (e.g., Linux):
  • How you installed fairseq (pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Qs on finetuning by myself on CNN dataset

❓ Questions on finetuing

I finetune the model by myself (from fairseq BART.large ckpt) using train_bart.sh in the repo and src=oraclewordsource, and got a quite strange ROUGE score compared to that of the released ckpt.

Code

I use 4 V100 GPUs so I change the update_freq to 16 to fit the original effective batch size (1024 x 8GPU x 8update_freq). The other parameters for finetuning are not changed. The exact train_bart.sh I used is as follows:

DATE=`date +%Y%m%d`
data_bin="cnndm"
dropout=0.1
label_smoothing=0.1
GPU=2,3,4,7
train_steps=30000
warmup_updates=500
lr=3e-05
src='oraclewordsource'
cstring=''
tgt='target'
update_freq=16  # 8 for 8 GPUs
max_tokens=1024
save_interval_updates=2000
keep_interval_updates=1
log_interval=200

criterion='label_smoothed_cross_entropy'
checkpoint="checkpoint_best.pt"

...

export CUDA_VISIBLE_DEVICES=${GPU} 
fairseq-train data-bin/${data_bin} \
    --restore-file ${restore_file} \
    --max-tokens ${max_tokens} \
    --task translation \
    --source-lang ${src} \
    --target-lang ${tgt} \
    --truncate-source \
    --layernorm-embedding \
    --share-all-embeddings \
    --share-decoder-input-output-embed \
    --required-batch-size-multiple 1 \
    --arch bart_large \
    --criterion label_smoothed_cross_entropy \
    --label-smoothing 0.1 \
    --dropout 0.1 --attention-dropout 0.1 \
    --weight-decay 0.01 --optimizer adam --adam-betas "(0.9, 0.999)" --adam-eps 1e-08 \
    --clip-norm 0.1 \
    --lr-scheduler polynomial_decay --lr ${lr} --total-num-update ${train_steps} --warmup-updates ${warmup_updates} \
    --max-update ${train_steps} \
    --update-freq ${update_freq} \
    --skip-invalid-size-inputs-valid-test \
    --find-unused-parameters \
    --log-format simple --log-interval ${log_interval} \
    --best-checkpoint-metric ppl \
    --save-dir ${SAVE} \
    --save-interval-updates ${save_interval_updates} --tensorboard-logdir ${TENSORBOARD}\
    --validate-interval 1000 --keep-interval-updates ${keep_interval_updates} --save-interval 1000 --no-epoch-checkpoints \
    ${add_load_string} \
    | tee -a ${SAVE}/stdout.log

The score on valid set

After 40k steps (actually not necessary, thx for comment) finetuning on CNN, I got a score on valid set as follows:

---------------------------------------------
1 ROUGE-1 Average_R: 0.46306 (95%-conf.int. 0.46130 - 0.46489)
1 ROUGE-1 Average_P: 0.46201 (95%-conf.int. 0.46011 - 0.46389)
1 ROUGE-1 Average_F: 0.45487 (95%-conf.int. 0.45328 - 0.45639)
---------------------------------------------
1 ROUGE-2 Average_R: 0.16004 (95%-conf.int. 0.15858 - 0.16146)
1 ROUGE-2 Average_P: 0.15922 (95%-conf.int. 0.15780 - 0.16062)
1 ROUGE-2 Average_F: 0.15693 (95%-conf.int. 0.15560 - 0.15829)
---------------------------------------------
1 ROUGE-L Average_R: 0.42004 (95%-conf.int. 0.41827 - 0.42180)
1 ROUGE-L Average_P: 0.41830 (95%-conf.int. 0.41649 - 0.42013)
1 ROUGE-L Average_F: 0.41230 (95%-conf.int. 0.41076 - 0.41381)

The score of the released ckpt from the repo I obtained is:

---------------------------------------------
1 ROUGE-1 Average_R: 0.65854 (95%-conf.int. 0.65601 - 0.66100)
1 ROUGE-1 Average_P: 0.58624 (95%-conf.int. 0.58347 - 0.58886)
1 ROUGE-1 Average_F: 0.60919 (95%-conf.int. 0.60702 - 0.61135)
---------------------------------------------
1 ROUGE-2 Average_R: 0.39357 (95%-conf.int. 0.39065 - 0.39653)
1 ROUGE-2 Average_P: 0.35346 (95%-conf.int. 0.35042 - 0.35649)
1 ROUGE-2 Average_F: 0.36590 (95%-conf.int. 0.36306 - 0.36879)
---------------------------------------------
1 ROUGE-L Average_R: 0.62027 (95%-conf.int. 0.61776 - 0.62274)
1 ROUGE-L Average_P: 0.55265 (95%-conf.int. 0.54987 - 0.55533)
1 ROUGE-L Average_F: 0.57412 (95%-conf.int. 0.57185 - 0.57647)

The difference made me very confused. I must miss some important details.

I notice that the released tar.gz contains some extra files like dict.extwordssourcetrunclead.txt, dict.targettrunclead.txt, which are used for ckpt evaluation but not for my own finetuning seemingly. Is this one of the reasons for my problems? What are the two txt files?

It would be very kind of you to help me. Thanks!

Unconditional Summarization Evaluation

I have the interaction summarisation setup done, but could you guide me towards how I can replicate the unconditional summarisation results in the paper for cnn-dailymail especially the evaluation part?

Error when using long text summaries

Hello, I am trying to summarize longer documents. Despite using the huggingface version where I can control the output length using min_len and max_len parameters. I get an error when using longer text inputs.

Error

IndexError                                Traceback (most recent call last)
[<ipython-input-61-fd46d00fe30a>](https://localhost:8080/#) in <module>()
----> 1 sum_ctrlsum(text_all[3])

9 frames
[/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2042         # remove once script supports set_grad_enabled
   2043         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2044     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2045 
   2046 

IndexError: index out of range in self

Code

CLEANR = re.compile('<.*?>') 
def sum_ctrlsum(text):
  data = tokenizer(text, return_tensors="pt")
  input_ids, attention_mask = data["input_ids"], data["attention_mask"]
  res = tokenizer.batch_decode(model.generate(input_ids, attention_mask=attention_mask, num_beams=5, min_length = 100, max_length = 200))[0]
  res = re.sub(CLEANR, '', res)
  return res

I am running the code on a colab instance. I am also removing the tags using regex.

Appreciate any help with this.

Example input: "This month, Hong Kong saw its Covid-19 death rate become the highest in the world, topping 37 deaths per million people. The recent outbreak was a brutal shock to the 7.4 million residents of the bustling metropolis, which had until recently kept Covid-19 cases to admirably low levels. Hong Kong was once applauded for its response to Covid-19. Then it became the global epicenter of the pandemic. Other cities in China like Shenzhen and Shanghai have also seen huge surges in infections, and countries in the Eastern Pacific like South Korea, Vietnam, Singapore, and Australia have seen a surge in cases this month as well. That’s largely due to the rise of BA.2, a highly infectious, hard to identify subvariant of omicron, itself a more transmissible version of the virus that causes Covid-19. Some of these countries also started to relax restrictions on travel and public gatherings just as the new subvariant took root. But a towering spike in cases doesn’t necessarily mean that hospitalizations and deaths will see a similar jump. In South Korea, even while reaching a new daily record of 470 Covid-19 deaths in March, the death rate so far has been 6 per million residents. So Hong Kong stands out because its latest Covid-19 wave was especially deadly. Fortunately, cases and deaths are on the decline. But it’s worth asking: Why did the latest Covid-19 wave hit Hong Kong so hard? To find out, I spoke with Dr. Kelvin To, a clinical associate professor of microbiology at the University of Hong Kong. He’s both a researcher studying Covid-19 and a physician at Queen Mary Hospital who treats patients. To explained that residents took some of Hong Kong’s earlier successes for granted, making them complacent in critical public health measures, like vaccinating people at high risk of severe disease. The conversation has been edited for length and clarity. Umair Irfan What are the factors right now that seem to be the most significant for Hong Kong’s Covid-19 outbreak? Why are we seeing it now? What variables should we be paying attention to? Kelvin To I think the most important is the vaccination problem in Hong Kong, the extremely low vaccination rates among the elderly, especially those older than 80 years old. The vaccination rate for them was only about 20 percent by the end of 2021. That’s the most vulnerable population, and they’re not protected at all. The data from our pandemic on this wave is very clear: Those elderly who were not vaccinated actually had a much, much higher, death rate than those with the vaccination. Another reason is the incidence of infection in Hong Kong was so low in the past. By the end of 2021, we had about 12,000 cases out of 7.3 million people in Hong Kong, which is less than 0.2 percent. So basically in Hong Kong, very, very few people have natural immunity against the virus. Third, in the past waves, you got about a hundred or so cases in a day, and that’s already a lot. But in those days with only a hundred cases, you can actually put everybody in the hospital, in isolation, in quarantine camps. But when they are not hundreds but thousands of cases per day, then people can only be quarantined at home. And, you know, Hong Kong is very crowded. Basically most people live in apartments and many of them live in very, very small apartments. Unfortunately, there are many poor people who actually share a flat with many other people. So this space is kind of impossible for you to do any preventive measures in those settings. And of course, the virus this time is very different. In the past, we in Hong Kong see the virus, we see infections, and then we isolate people. Usually, the spread is very limited once you do that. But this time, especially at the beginning of the omicron wave, when we still had very, very few cases, we did a lot of investigations into each of the clusters. You can see that in a restaurant, an infected patient sitting in one corner of the restaurant and another customer sitting at the other end of the restaurant got infected. It’s not just spreading to people around you, but can actually spread over long distances. For example, there are cases in apartment buildings. And what people have found is that spread is not because of direct contact between neighbors but because infected air that got removed from a flat from an exhaust fan can go up through the air to the other apartments. Umair Irfan Given all those factors, was this preventable? Kelvin To If Hong Kong was much, much better vaccinated, then I believe this wave could’ve been prevented. In Hong Kong, the vaccination campaign was very aggressive. Almost every day, you heard on the news that you should get vaccinated. And the government is doing as much as they can to get people vaccines. They tried to have vaccine passes before you can go to certain restaurants, and things like that. But the problem is some people just don’t want to get vaccinated, or worse, they don’t want their elderly parents to get vaccinated. The reason is because they thought — and many are still thinking — that the Covid-19 vaccine is very dangerous. Another reason that people in Hong Kong don’t want to get vaccinated is they believe they won’t get the infection because the incidence of Covid-19 was so low in the past. So they thought, okay, if I stay indoors, I don’t go out, I always wear protection, then I won’t get infected. This is something that happens when you’re doing too well, in some ways, so people think it’s not important to get vaccinated. Umair Irfan Hong Kong is not the only place seeing an outbreak in China. There are outbreaks in Shenzhen and Shanghai. I’m wondering how distinct is Hong Kong’s outbreak, or what does it have in common with these other major cities? Kelvin To The major difference in Shenzhen or Shanghai is that they still have sufficient facilities to isolate or quarantine people. They also locked down the cities very early. They had PCR tests for the whole population, we’re talking about millions of people. Basically they could do a massive screening for the whole city in a short period of time. They isolated everyone who was infected very quickly. That’s why they were able to really stop it within a week. Of course, Shenzhen is much bigger than Hong Kong and not as crowded. Umair Irfan What does this mean for China’s zero-Covid strategy? Kelvin To It’s not really an absolute zero policy. The Chinese government calls it dynamic zero. What they mean is you try to catch it early and then try to stop it from spreading. It sort of worked well in that the Chinese economy and people are living quite normally. But of course, you cannot shut down from the rest of the world forever. It’s a question of timing of when to start to relax. If you suddenly open up now, then a lot of cities in China can just become like Hong Kong and suddenly the health care system can collapse in a matter of weeks. So a full opening-up is very dangerous right now. When to open up depends on a lot of things. And I think the most important is the vaccination rate and the availability of effective Covid-19 drugs. Umair Irfan What do you make of how the leadership in Hong Kong is handling this? How could they improve? Kelvin To I think there are obviously things that can be improved. For example, I think the government should review what has happened in this wave. Was there any moment that more aggressive measures could’ve been taken in the beginning, especially in the middle of February? Secondly, there should be better coordination in terms of quarantining or isolating people. You can never have enough place to isolate everyone in Hong Kong. It’s impossible. But at least the government should have plans for how to convert existing places into isolation facilities more quickly. Umair Irfan For those of us watching from the outside, what lessons do you think we should take away? Kelvin To In Hong Kong, people didn’t believe it would occur. People did not believe the health care system would collapse. The government had some plans, but they probably didn’t anticipate that it would collapse the way that it did in Hong Kong. For the rest of the world, no matter how good it is right now in terms of the pandemic, I can only say that Covid can just catch you by surprise. Omicron this time is fortunate in a way that it is milder than previous variants. But you never know, you may have a variant that is as transmissible or more transmissible than omicron, or even more severe than the previous strains. I think people around the world should not expect that you can just think it’s gone. I’m not saying that people should be scared, but at least there should be some kind of preparedness plan. The window of opportunity to stop an outbreak is very, very narrow. Once the health care system collapsed, everything is just like a domino effect. Umair Irfan How are you, your family, and your colleagues holding up through all this? Kelvin To We are okay. Of course, we lost our social life. I haven’t eaten out for a long, long time and I haven’t seen a lot of friends. One good thing about being a doctor is that I still have to go to work every day, just like normal life, but even more busy. For me, there’s still some social interactions at work, which I prefer. I myself do not like work from home because I prefer real face-to-face social interaction. I would say I’m very lucky. I’m a doctor and I didn’t lose my job. Many people in Hong Kong have, because of all the restrictions, basically lost their income. This wave has taken a toll on them, definitely."

how to do test-time keyword extraction?

In the research paper, it was mentioned that ctrl-sum can extract keywords at test time. May I know how do I do it? The provided tutorials are just for summaries, not keywords.
image

Tagger: bert or roberta?

❓ Questions and Help

Hi there,

just wanted to ask if you used bert-large-cased or roberta-large to initialize the weights of the tagger (both options are in the training script).

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.