llsourcell / chatbot-ai Goto Github PK

Chatbot AI for Machine Learning for Hackers #6

License: MIT License

Lua 100.00%

chatbot-ai's Introduction

Overview

This is an attempt at implementing Sequence to Sequence Learning with Neural Networks (seq2seq) and reproducing the results in A Neural Conversational Model (aka the Google chatbot). The model is based on two LSTM layers. One for encoding the input sentence into a "thought vector", and another for decoding that vector into a response. This model is called Sequence-to-sequence or seq2seq. This the code for 'Build a Chatbot' on Youtube

Source: http://googleresearch.blogspot.ca/2015/11/computer-respond-to-this-email.html

Dependencies

Install Torch.
Install the following additional Lua libs:
```
luarocks install nn
luarocks install rnn
luarocks install penlight
```
To train with CUDA install the latest CUDA drivers, toolkit and run:
```
luarocks install cutorch
luarocks install cunn
```
To train with opencl install the lastest Opencl torch lib:
```
luarocks install cltorch
luarocks install clnn
```
Download the Cornell Movie-Dialogs Corpus and extract all the files into data/cornell_movie_dialogs.

Basic Usage

Here's a sample conversation after training for 20 epoch with 50000 examples, using the following command:

th train.lua --cuda --dataset 50000 --hiddenSize 1000

(Took 3 days to train on my GeForce GTX 780M.)

For OpenCL, use --opencl instead of --cuda. To train on CPU, don't provide any of those two. Use the --dataset NUMBER option to control the size of the dataset. Training on the full dataset takes about 5h for a single epoch. The model will be saved to data/model.t7 after each epoch if it has improved (error decreased).

me: Hello? bot: Hi.

me: How are you? bot: I'm fine.

me: What's your name? bot: It's hard to describe.

me: How so? bot: I'm not sure.

me: What color is the sky? bot: It's blue.

me: What is your job? bot: It's not that i'm a fucking werewolf!

me: What is the purpose of life? bot: A gift.

me: Are you intelligent? bot: Yes, well...

me: Are you a machine? bot: That's a lie.

me: Are you human? bot: No, i'm not.

me: What are you? bot: I'm not sure.

To load the model and have a conversation:

th -i eval.lua --cuda # Skip --cuda if you didn't train with it
# ...
th> say "Hello."

Credits

Credit for the vast majority of code here goes to Marc-André Cournoyer. I've merely created a wrapper around all of the important functions to get people started.

chatbot-ai's People

Contributors

Stargazers

Watchers

Forkers

rpj911 chagge raeed20 bin2000 goodrahstar alokranjan1234 jokame arunlodhi kevark sato-shi avinash-k ishwarcoriolis vijaysudheer robustfengbin cheungsingyi johndpope iamlos collawolley rangasumanth rocwzp dnzengou michax bobquest33 adolfo255 spivegin smilechun saitamandd velamurip ranahasan satroan paulo-batista johnpineda4 davidfumo chromeappplayj zhuomingliang puccife veyselkoparal thangnvdigdinos eason-su andygoo jeremialcala dantevito kde424 snci jabsa brianhu2006 ponlee mihumooc raghavendranpm xqjiang rollingstone melody-xiaomi scottai linhuaiyi yangkf1985 yaduvendra denis-chen atifs derickp miguelsalazar marclachapelle restmad wengjian2025 naidubharadwaj9 willeyang shafaypro vr-mr birlaprasoon coregiu xstarcto clustersdata hexopensource ashrovy lovive kanaadpathak dimitrismav idhamhalim lenona2017 kevin-411 brucezhu92 adult-dating sahara2001 chladams cc8848 xiaoduozhou aiedward auserj cuiyi0501 leiray zigma51 rajnish-aryann mohan67nv raghulsuraj cloudxtreme liujun26 bhyvex kaisar420 mblackbourne yingzhi1234 lemonyellow-labs

chatbot-ai's Issues

Any example to use ubuntu-corpus for data training

Do we have some kind of common code to read and merge this ubuntu-corpus data with movie data ?

http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/

Thanks

Put clear link to original project and MIT licence

I see you added a link to my GitHub profile at the bottom of the README, thx for that.

But please add a link to the project page at the top so everyone can find it and contribute: https://github.com/macournoyer/neuralconvo. Ideally, use the fork feature of GitHub so there's a link at the top and it's clear that this is a fork.

Also, I've released the code under MIT, which states:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

So please put back the last section of the README about the license that you removed.

Thanks in advance for respecting my work!

Error while training

/torch/install/share/lua/5.1/rnn/Mufuru.lua:18: attempt to call field 'Unsqueeze' (a nil value)

stack traceback:
[C]: in function 'error'
/torch-cl/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
train.lua:1: in main chunk
[C]: in function 'dofile'
/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x004064a0

expecting target table

when i run

th train.lua --cuda --dataset 50000 --hiddenSize 1000

Iget

-- Epoch 1 / 50

/root/torch-cl/install/bin/luajit: ...orch-cl/install/share/lua/5.1/rnn/SequencerCriterion.lua:47: expecting target table
stack traceback:
[C]: in function 'assert'
...orch-cl/install/share/lua/5.1/rnn/SequencerCriterion.lua:47: in function 'forward'
./seq2seq.lua:74: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...t/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

any help would be appreciated

How to create a corpuss like cornell movie format?

hi siraj,
thank you for your codes and youtube channel.
how can I create a corpus like cornell movie`s corpus?
I have a bunch of questions&answers in MySQL format and I want use them n this code

unable to convert argument 3 from cdata<struct THCudaTensor> to cdata<struct THCudaLongTensor>

Im using Lua 5.1 and torch7. While trying to execute

th train.lua --cuda --dataset 50000 --hiddenSize 1000

I'm getting the below error:

Dataset stats:
  Vocabulary size: 25931
         Examples: 83632

-- Epoch 1 / 50

/home/siva/torch/install/bin/lua: unable to convert argument 3 from cdata<struct THCudaTensor*> to cdata<struct THCudaLongTensor*>
stack traceback:
        [C]: in function 'v'
        /home/siva/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'ClassNLLCriterion_updateOutput'
        ...torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function <...torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:29>
        (tail call): ?
        ...rch/install/share/lua/5.1/rnn/SequencerCriterion.lua:55: in function <...rch/install/share/lua/5.1/rnn/SequencerCriterion.lua:39>
        (tail call): ?
        ./seq2seq.lua:74: in function 'train'
        train.lua:85: in main chunk
        [C]: in function 'dofile'
        .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: ?

Installation command for openCL packages

Ran into an install error using the following commands for cltorch, clnn.

The github repo recommends using the non luarocks installation method:

Please do NOT use any of: luarocks install nn, luarocks install torch, luarocks install cltorch, luarocks install clnn, luarocks install cutorch, or luarocks install cunn. This will break your installation, and is not supported. The supported update method is:

git clone --recursive https://github.com/hughperkins/distro -b distro-cl ~/torch-cl
cd ~/torch-cl
bash install-deps
./install.sh

Update packages using these commands

cd ~/torch-cl
git pull
git submodule update --init --recursive
./install.sh

Killed

When attempting to train the program will run and the post "Killed"

th train.lua --dataset 50000 --hiddenSize 1000
-- Loading dataset
data/vocab.t7 not found
-- Parsing Cornell movie dialogs data set ...
[==================== 387810/387810 ==========>] Tot: 3s942ms | Step: 0ms
-- Pre-processing data
[==================== 50000/50000 ============>] Tot: 33s312ms | Step: 0ms
-- Removing low frequency words
[==================== 83632/83632 ============>] Tot: 12s831ms | Step: 0ms
Writing data/examples.t7 ...
[==================== 83632/83632 ============>] Tot: 28s333ms | Step: 0ms
Writing data/vocab.t7 ...

Dataset stats:
Vocabulary size: 25931
Examples: 83632
Killed

Running the basic readme demo