This code is for our paper A General Contextualized Rewriting Framework for Text Summarization.
Python Version: Python3.7.10
Package Requirements: torch==1.9.0 tensorboardX numpy==1.21.6
Framework: Our model and experiments are built upon fairseq v0.10.2.
Before running the scripts, please install the dependencies by:
bash setup.sh
Before evaluating BART-Rewriter, please follow the readme file under folder bert_extractors to download previous BERT extractor models.
(Notes: We train our models on 2 Tesla V100.)
-
Download the prepared data and trained models. Unzip the files into folder exp_test.
-
Evaluate the models:
# BART-Rewriter (Rewriter with external sentence extractor)
CUDA_VISIBLE_DEVICES=0 bash exp_rewriter/test-rewriter.sh exp_test rewriter bertext
# BART-JointSR (Rewriter with joint internal sentence selector)
CUDA_VISIBLE_DEVICES=0 bash exp_rewriter/test-rewriter.sh exp_test jointsr none
-
Preprocess CNN/Dialy Mail.
Follow the instruction to convert the data into tokenized stories:
cnn-dailymail/cnn_stories_tokenized/
cnn-dailymail/dm_stories_tokenized/
- Preprocess and binarize for our model:
# BART-Rewriter (Rewriter with external sentence extractor)
bash exp_rewriter/prepare-data.sh exp_test large rewriter
# BART-JointSR (Rewriter with joint internal sentence selector)
bash exp_rewriter/prepare-data.sh exp_test large jointsr
# BART-Rewriter (Rewriter with external sentence extractor)
CUDA_VISIBLE_DEVICES=0,1 bash exp_rewriter/run-bart-large.sh exp_test rewriter
# BART-JointSR (Rewriter with joint internal sentence selector)
CUDA_VISIBLE_DEVICES=0,1 bash exp_rewriter/run-bart-large.sh exp_test jointsr
# BART-Rewriter (Rewriter with external sentence extractor)
CUDA_VISIBLE_DEVICES=0 bash exp_rewriter/test-rewriter.sh exp_test rewriter bertext
# BART-JointSR (Rewriter with joint internal sentence selector)
CUDA_VISIBLE_DEVICES=0 bash exp_rewriter/test-rewriter.sh exp_test jointsr none