Giter Site home page Giter Site logo

colinsongf / rl-chatbot Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pochih/rl-chatbot

0.0 3.0 0.0 52.65 MB

๐Ÿค– A movie-styled chatbot trained by deep reinforcement learning

License: GNU General Public License v3.0

Python 99.01% Shell 0.99%

rl-chatbot's Introduction

MIT

Intro

This is a chatbot trained by seq2seq and reinforcement learning.

  • seq2seq

It's an encoder-decoder architecture, similar to the one described in paper Sequence to Sequence -- Video to Text.

  • RL

After training chatbot with enough epochs, I use one kind of RL techniques called policy gradient, to train chatbot.

By doing this, the chatbot can generate more interesting response, correspond to the reward function.

My reward function is similar to the one described in paper Deep Reinforcement Learning for Dialogue Generation.

chatbot's results

Let me show you some chatbot's results:

  • Result 1 (A & B are both a trained chatbot)

A: Can you do me a favor?

B: Anything.

A: I want you to bend a father.

B: Of course.

A: I'm so sorry I didn't mean to be rude but.

B: Don't get it at least it's about as much as I can.

  • Result 2 (A & B are both a trained chatbot)

A: I didn't do it! I haven't got a gun!

B: You liar. You still want to get your money back.

A: Something wrong with the truth.

B: I don't want any money just a little I can't take just out of this.

A: What's wrong.

B: Listen to me. I'm scared now.

A: Ah you'll need all the money I need your help.

  • Result 3 (A & B are both a trained chatbot)

A: I forgot to get the Coca-Cola.

B: I got something stuck in the head.

A: It's all right I guess it's pretty common in the lot of shit.

B: I mean we've been all this together since the kid.

A: All the more reason.

B: It's not him it's his fault and he's blind because of god.

A: And is a strange story.

you can find more results from chatbot in here

neural networks' configs of those results are described in the filename

simulate a dialog by pre-trained model

Are you amazed at the result of the chatbot? :D

You can generate those amazing dialogs by your own!

Just follow the instructions below:

./script/download.sh
(Mention that I use -nc parameter in script/download.sh, it will omit downloading if the file exists
So make sure there's no break during the download)
./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <PATH TO MODEL>

to generate seq2seq dialog, type "model/Seq2Seq/model-77"

to generate RL dialog, type "model/RL/model-56-3000"

  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot only considers last sentence

if you choose 2, chatbot will consider last two sentences (one from user, and one from chatbot itself)

  • <INPUT FILE>

Take a look at example file

This is the input format of the chatbot, each line is the begin sentence of a dialog.

You can just use the example file for convenience.

  • <OUTPUT FILE>

the output file, type any filename you want

generate response by pre-trained model

If you want chatbot to generate only a single response

Follow the instructions below:

./script/download.sh
(Mention that I use -nc parameter in script/download.sh, it will omit downloading if the file exists. So make sure there's no break during the download)
./script/run.sh <TYPE> <INPUT FILE> <OUTPUT FILE>
  • <TYPE>

to generate seq2seq response, type "S2S"

to generate reinforcement learning response, type "RL"

  • <INPUT FILE>

Take a look at example file

This is the input format of the chatbot, each line is the begin sentence of a dialog.

You can just use the example file for convenience.

  • <OUTPUT FILE>

the output file, type any filename you want

train chatbot from scratch

I trained my chatbot with python2.7.

If you want to train the chatbot from scratch

You can follow those instructions below:

Step0: training configs

Take a look at config.py, all configs for training is described here.

You can change some training hyper-parameters, or just keep the original ones.

Step1: download data & libraries

I use Cornell Movie-Dialogs Corpus

You need to download it, unzip it, and move all .txt files into data/ directory

Then download some libraries with pip:

pip install -r requirements.txt

Step2: parse data

(in this step I use python3)
./script/parse.sh

Step3: train a Seq2Seq model

./script/train.sh

Step4-1: test a Seq2Seq model

Let's show some results of seq2seq model :)

./script/test.sh <PATH TO MODEL> <INPUT FILE> <OUTPUT FILE>

Step4-2: simulate a dialog

And show some dialog results from seq2seq model!

./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot will only considers user's utterance

if you choose 2, chatbot will considers user's utterance and chatbot's last utterance

Step5: train a RL model

you need to change the training_type parameter in config.py

'normal' for seq2seq training, 'pg' for policy gradient

you need to first train with 'normal' for some epochs till stable (at least 30 epoches is highly recommended)

then change the method to 'pg' to optimize the reward function

./script/train_RL.sh

When training with policy gradient (pg)

you may need a reversed model

the reversed model is also trained by cornell movie-dialogs dataset, but with source and target reversed.

you can download pre-trained reversed model by

./script/download_reversed.sh

or you can train it by your-self

you don't need to change any setting about reversed model if you use pre-trained reversed model

Step6-1: test a RL model

Let's show some results of RL model, and find the different from seq2seq model :)

./script/test_RL.sh <PATH TO MODEL> <INPUT FILE> <OUTPUT FILE>

Step6-2: simulate a dialog

And show some dialog results from RL model!

./script/simulate.sh <PATH TO MODEL> <SIMULATE TYPE> <INPUT FILE> <OUTPUT FILE>
  • <SIMULATE TYPE>

can be 1 or 2

the number represents # of former sentence(s) that chatbot considers

if you choose 1, chatbot only considers last sentence

if you choose 2, chatbot will consider last two sentences (one from user, and one from chatbot itself)

rl-chatbot's People

Contributors

pochih avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.