Giter Site home page Giter Site logo

jinhuasu / pens-personalized-news-headline-generation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from llluoling/pens-personalized-news-headline-generation

0.0 1.0 0.0 130 KB

Code for PENS: A Dataset and Generic Framework for Personalized News Headline Generation

Python 69.41% Jupyter Notebook 30.59%

pens-personalized-news-headline-generation's Introduction

PENS - ACL2021

{PENS}: A Dataset and Generic Framework for Personalized News Headline Generation

This is a Pytorch implementation of PENS.

I. Guidance

0. Enviroment

  • Install pytorch version >= '1.4.0'
  • Install the pensmodule package under ''PENS-Personalized-News-Headline-Generation'' using code pip install -e .

1. Data Prepare

  • Download the PENS dataset here and put the dataset under data/.
  • (optional) Download glove.840B.300d.txt under data/ if you choose to use pretrained glove word embeddings.

2. Running Code

  • cd pensmodule
  • Follow the order: Preprocess --> UserEncoder --> Generator and run the pipeline**.ipynb notebook to preprocess, train the user encoder and the train generator, individually.

More infor please refer to the homepage of the introduction of PENS dataset.

II. Training Tips

Here we take NRMS as user encoder, the followings are some experiment detailes that are not illustrated in the paper.

0. TIPS

  • In this paper, we used mento carlo search for RL training, which is very slow in training and sometimes hard to converge. Thus we provide ac training in this provided code.
  • If you pretrain the generator for a couple of epoches, you should set a very small learning rate during RL training.
  • Large improvements can be made compared with the baselines that we provided, the importance always lies in the design of reward functions.

1. Training Reward

image info

2. Test performance on different training steps

image info image info image info

3. Cases

epoch generated headline
Case 1
1000 top stockton news arrests 2 impaired drivers
5000 top stockton news arrests 2 impaired drivers who had unrestrained children in their cars
Case 2
1000 trump says tens of thousands of people couldn t get in 2020 rally
5000 trump says tens of thousands of people outside his 2020 campaign rally at orlando

Noted:

  • With the training process goes, the generated sentences are more fluent and contains more rich information.
  • Rouge scores is not the best evaluation scores, but a compromising choice. Of course the best evaluation is to check out the real clicks of users to see if they are more interested. Thus sometimes a more fluent and human-like generated sentence gets lower rouge scores.

pens-personalized-news-headline-generation's People

Contributors

llluoling avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.