Giter Site home page Giter Site logo

pptt168 / lancosum Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lancopku/lancosum

0.0 0.0 0.0 3.14 MB

A toolkit for abstractive summarization, which is easy to implement the baseline and our proposed models, which can achieve the SOTA performance.

Python 79.31% HTML 20.69%

lancosum's Introduction

LancoPKU Summarization

This repository provides a toolkit for abstractive summarization, which can assist researchers to implement the common baseline, the attention-based sequence-to-sequence model, as well as three models proposed by our group LancoPKU recently. These models can achieve the improved performance and are capable of generating summaries of higher quality. By modifying the '.yaml' configuration file or the command options, one can easily apply the models to his own work. Names of these models and their corresponding papers are listed as follows:

  1. Global Encoding for Abstractive Summarization [pdf]
  2. Word Embedding Attention Network (WEAN) [pdf]
  3. SuperAE [pdf]


1 How to Use

--- 1.1 Requirements

  • Ubuntu 16.0.4
  • Python 3.5
  • Pytorch 0.3.1
  • pyrouge
  • matplotlib (for the visualization of attention heatmaps)
  • Tensorflow (>=1.5.0) and TensorboardX (for data visualization on Tensorboard)

--- 1.2 Configuration

Install PyTorch

Clone the LancoSum repository:

git clone https://github.com/lancopku/LancoSum.git
cd LancoSum

In order to use pyrouge, set rouge path with the line below:

pip install pyrouge
pyrouge_set_rouge_path script/RELEASE-1.5.5

--- 1.3 Preprocessing

python3 preprocess.py -load_data path_to_data -save_data path_to_store_data

Remember to put the data (plain text filea) into a folder and name them train.src, train.tgt, valid.src, valid.tgt, test.src and test.tgt, and make a new folder inside called data


--- 1.4 Training

python3 train.py -log log_name -config config_yaml -gpus id

--- 1.5 Evaluation

python3 train.py -log log_name -config config_yaml -gpus id -restore checkpoint -mode eval

2 Introduction to Models

--- 2.1 Global Encoding

Motivation & Idea

Conventional attention-based seq2seq model for abstractive summarization suffers from repetition and semantic irrelevance. Therefore, we propose a model containing a convolutional neural network (CNN) fitering the encoder outputs so that they can contain some information of the global context. Self-attention mechanism is implemented as well in order to dig out the correlations among these new representations of encoder outputs. Model

Options
python3 train.py -log log_name -config config_yaml -gpus id -swish -selfatt

--- 2.2 WEAN

Motivation & Idea

In the decoding process, conventional seq2seq models typically use a dense vector in each time step to generate a distribution over the vocabulary to choose the correct word output. However, such a method takes no account of the relationships between words in the vocabulary and also suffers from a large amount of parameters (hidden_size * vocab_size). Thus, in this model, we use a query system. The output of decoder is a query, the candidate words are the values, and the corresponding word representations are the keys. By refering to the word embeddings, our model is able to capture the semantic meaning of the words. Model

Options
python3 train.py -log log_name -config config_yaml -gpus id -score_fn function_name('general', 'dot', 'concat')

--- 2.3 SuperAE

Motivation & Idea

Corpus from social media is generally long, containing many errors. A conventional seq2seq model fails to compress a long sentence into an accurate representation. So we intend to use the representation of summary (which is shorter and easier to encode) to help supervise the encoder to generate better semantic representations of the source content during training. Moreover, ideas of adverserial network is used so as to dynamically dertermine the strength of such supervision. Model

Options
python3 train.py -log log_name -config config_yaml -gpus id -sae -loss_reg ('l2', 'l1', 'cos')

3 Citation

Plese cite these papers when using relevant models in your research.

Global Encoding:

@inproceedings{globalencoding,
  title     = {Global Encoding for Abstractive Summarization},
  author    = {Junyang Lin and Xu Sun and Shuming Ma and Qi Su},
  booktitle = {{ACL} 2018},
  year      = {2018}
}

WEAN:

@inproceedings{wean,
  author    = {Shuming Ma and Xu Sun and Wei Li and Sujian Li and Wenjie Li and Xuancheng Ren},
  title     = {Query and Output: Generating Words by Querying Distributed Word
	       Representations for Paraphrase Generation},
  booktitle = {{NAACL} {HLT} 2018, The 2018 Conference of the North American Chapter
	       of the Association for Computational Linguistics: Human Language Technologies},
  year      = {2018}
}

SuperAE:

@inproceedings{Ma2016superAE,
  title   = {Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization},
  author  = {Shuming Ma and Xu Sun and Junyang Lin and Houfeng Wang},
  booktitle = {{ACL} 2018},
  year      = {2018}
}

lancosum's People

Contributors

justinlin610 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.