Giter Site home page Giter Site logo

nlp-project's Introduction

Natural Language Processing Assignment 2&3

Group member (UCL): Youning, Wan Jing, Zoey, YanSong

Due date: Saturday, 21 March 2020, 12:05 AM

Report and code link : LINK

The report is written in a ACL format. The code is mostly implemented in Pytorch. Meanwhile, it also requires the evaluation metric packages:

Coursework instruction: Link

Abstract:

This paper sets out to assess the performanceof Deep Reinforcement Learning (DRL) based abstractive summarization models. 4 different model variants are applied on 3 datasets: CNN/Daily Mail, Gigaword and WikiHow, with ROUGE and BERTScores evaluated. Working on the novel WikiHow dataset which is slightly more complex to train on has magnified the characteristics of the models. It exposes the instability of training on ROUGE-L scores in some cases and suggests BERTScore as an alternative.

Model implemented:

The model is an encoder-decoder LSTM network with attention mechanism applied on both encoder and decoder, also the pointer network (https://arxiv.org/abs/1602.06023) is included. The schemetic model plot is shown as:

Model training objective:

What distinguishes between different model variants is the training objective. We have tried four types:

  • Maximum likelihood (ML): this is to maximise the log-probability of obtaining the correct outputs

  • Reinforcement learning (RL) with ROUGE reward: this is analogous to the REINFORCE algorithm in policy gradient method where the objective is to maximise the probability of obtaining the highest ROUGE score. Different from ML, now we directly optimise the model w.r.t the evaluation metric and it is expected that it will have higher testing score

  • RL with BERScore reward: same as previous one but with different reward

  • Hybrid (ML+RL) objective : this is combining the objective function of ML and RL.

Saved models and data:

Below are the saved (pre-trained) NLP models under different training objectives and the datasets as well.

-----------------------------------WikiHow-----------------------------------------------------

WikiHow data pre_trained model (ML for 10,000 iterations): LINK

WikiHow data files: LINK

WikiHow RL training models: LINK

BERTScore: LINK

-----------------------------------CNN/DM-----------------------------------------------------

CNN train/valid/test/vocab files (Mike): LINK

CNN/DailyMail train/valid/test/vocab files (Youning)(including pre-trained models) : LINK

Pre-trained Model Colab(Youning): LINK

CNN ML trained models(Zoey): folder link

CNN RL(r) trained models(Zoey): folder link

CNN RL+ML trained models(YS): folder link

CNN RL(b) trained models(Zoey): folder link

-----------------------------------Gigaword-----------------------------------------------------

Gigaword pre-trained model (ML): LINK

Gigaword data: LINK

Gigaword ML+RL trained models: LINK

Gigaword ML trained models: LINK

Gigaword RL trained models: on DeepNotes.

Relavant publications

Not that relavant but very fundamental and useful publications

Relavant datasets

Some useful webset

Advices from Jiang, Minqi

Useful Recource for Experiment

Useful Resource for Writing Paper

nlp-project's People

Contributors

yansong97 avatar zoey7407 avatar younei avatar swanjing avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.