sherjilozair / char-rnn-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

2.6K 139.0 964.0 508 KB

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

License: MIT License

Python 100.00%

char-rnn-tensorflow's Introduction

char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow.

Inspired from Andrej Karpathy's char-rnn.

Requirements

Tensorflow 1.0

Basic Usage

To train with default parameters on the tinyshakespeare corpus, run python train.py. To access all the parameters use python train.py --help.

To sample from a checkpointed model, python sample.py. Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU. To force CPU mode, use export CUDA_VISIBLE_DEVICES="" and unset CUDA_VISIBLE_DEVICES afterward (resp. set CUDA_VISIBLE_DEVICES="" and set CUDA_VISIBLE_DEVICES= on Windows).

To continue training after interruption or to run on more epochs, python train.py --init_from=save

Datasets

You can use any plain text file as input. For example you could download The complete Sherlock Holmes as such:

cd data
mkdir sherlock
cd sherlock
wget https://sherlock-holm.es/stories/plain-text/cnus.txt
mv cnus.txt input.txt

Then start train from the top level directory using python train.py --data_dir=./data/sherlock/

A quick tip to concatenate many small disparate .txt files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt.

Tuning

Tuning your models is kind of a "dark art" at this point. In general:

Start with as much clean input.txt as possible e.g. 50MiB
Start by establishing a baseline using the default settings.
Use tensorboard to compare all of your runs visually to aid in experimenting.
Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
Tweak --num_layers from 2 to 3 but no higher unless you have experience.
Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.

Tensorboard

To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your log_dir. E.g.:

$ tensorboard --logdir=./logs/

Then open a browser to http://localhost:6006 or the correct IP/Port specified.

Roadmap

Add explanatory comments
Expose more command-line arguments
Compare accuracy and performance with char-rnn
More Tensorboard instrumentation

Contributing

Please feel free to:

Leave feedback in the issues
Open a Pull Request
Join the gittr chat
Share your success stories and data sets!

char-rnn-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

btwardus fdoperezi alvations codeaudit ml-lab ml-ai-nlp-ir ssatyacc saurav111 brian8128 nfmcclure zuiwufenghua arunbalajeev salemameen jroakes ypuzikov wangxiong2015 garyfeng jiangnanhugo zhangkom davidbench g-wang wavelets starakaj zhouruiapple liangpj lopuhin vseledkin freefrancisco indiejoseph jtoy simmoncn epigos jackdogan moherx kumarsameer alexbaldwin yoavz nakosung mkn9 rbramwell withpop aamelegy krishperumal mpacer gfarnadi claravania stevenlol pierrebeauguitte qixianbiao suisuina0823 breezedeus ziky90 halisyilboga1 pranavmaddula goodrahstar ivanshpotenko sandy4321 wohnjayne roshanraj codeman38 markwzx bnaul imclab suncj caigaojiang nishithbsk martingms lgstd chrislit adrianlyjak sunsocool drakh rwclarity chagge wzhd sunqf jiongye zbot21 rubinovitz rickyall scofield0li timomey bver cwhy dong77 azmiozgen abdelrahmanhosny thedatalass vangogh0318 mmarklar jesseoh jimgoo tmusy deeplearningprojects tuyendothanh nikoma yzhwang weitcode arita37 miradel51

char-rnn-tensorflow's Issues

sample.py crashes if training data did not have a space

When sampling a model, whose vocab does not contain a space I get the following error:

Traceback (most recent call last):
  File "sample.py", line 46, in <module>
    main()
  File "sample.py", line 27, in main
    sample(args)
  File "sample.py", line 43, in sample
    args.sample).encode('utf-8'))
  File "E:\Projekte\tensorflow\rnn\char-rnn-tensorflow\model.py", line 107, in sample
    x[0, 0] = vocab[char]
KeyError: ' '

Simplest test case is that Hello world! works, but Helloworld! dose not.

I'm using tensorflow-gpu on python 3.5.0.

Enhancement - Dockerfile + GIST

I drafted this Dockerfile with parsey mcparseface.

https://github.com/johndpope/DockerParseyMcParsefaceAPI/blob/master/docker/dsparseyapi/Dockerfile

(there's a script that goes off and builds parsey with grpc api - takes > 90 mins on mac)
But you could cherry pick base file.

Bidirectional RNN?

There should be an option to add a bidirectional recurrent neural network using the three core RNN cells.

Tensorflow 0.11 Tuple issue

This works fine for me with Tensorflow 0.10 but does not with Tensorflow 0.11

Some issue with Tuple. Is anyone else having this issue?

state_is_tuple=true

might be the solution but not sure where to use it in both train.py and model.py

ImportError: cannot import name legacy_seq2seq

Hi!

I just downloaded the code and tried to run it with the train.py with tf-v1.0. I think it should train the Shakespeare example. Readme says default parameters. However it fails

Traceback (most recent call last):
  File "train.py", line 11, in <module>
    from model import Model
  File "/opt/gpu-project/char-rnn-tensorflow/model.py", line 3, in <module>
    from tensorflow.contrib import legacy_seq2seq
ImportError: cannot import name legacy_seq2seq

No dropout option?

Also, dropout doesn't seem to be there. Any way we can use some regularization?

Word vector using char rnn?

Hi
First of great effort to put char rnn into tensorflow. I am just curious if we can get word vector using this approach?

Thanks

Sampling probablilities do not sum to 1

When I do "print repr(sum(p))" it always gives me numbers very close to 1.0 like 1.00000052 0.9999999248 and so on

Traceback (most recent call last):
File "sample.py", line 38, in
main()
File "sample.py", line 21, in main
sample(args)
File "sample.py", line 35, in sample
print model.sample(sess, chars, vocab, args.n, args.prime)
File "/home/patro/Documents/Programming/NN/char-rnn-tensorflow/model.py", line 77, in sample
sample = int(np.random.choice(len(p),p=p))
File "mtrand.pyx", line 1094, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10565)
ValueError: probabilities do not sum to 1

weighted_pick() gives random choices?

Hi, does weighted_pick() return random characters?

with sampling_type=1, return(int(np.searchsorted(t, np.random.rand(1)*s))) looks like giving a random choice out of the array t. And that's what I'm having right now. Is it intentional? Or am I doing something wrong?

Curious...

Hi
I wanted to see whether I could learn things about other types of text but it seems to be problematic. Is there something specific about the headline chapter format of these plays that somehow made it into the model? Curious, when I give it other texts the loss never decreases below 1.2.
Just curious
Andy

Instructions for sampling prior models?

I've searched the repository, read the documentation a few times, and tried invoking python sample.py on anything that looked remotely interesting in the data directory. Calling python sample.py works great on the model that's currently in training. How do I call an arbitrary model? Is there some parameter keyword I need to pass along with the filename argument to sample.py? "--sample" seemed like a good bet, but on closer inspection that option doesn't look related.

chars_vocab.pkl                      model.ckpt-1000.index                model.ckpt-3000.meta                 model.ckpt-62000.data-00000-of-00001 model.ckpt-64000.index
checkpoint                           model.ckpt-1000.meta                 model.ckpt-4000.data-00000-of-00001  model.ckpt-62000.index               model.ckpt-64000.meta
config.pkl                           model.ckpt-2000.data-00000-of-00001  model.ckpt-4000.index   

(tensorflow) Nobodys-MacBook-Pro:char-rnn-tensorflow kz$ python sample.py model.ckpt-64149.index
usage: sample.py [-h] [--save_dir SAVE_DIR] [-n N] [--prime PRIME]
                 [--sample SAMPLE]
sample.py: error: unrecognized arguments: model.ckpt-64149.index

sample.py: error: unrecognized arguments: model.ckpt-64149.data-00000-of-00001
(tensorflow) Nobodys-MacBook-Pro:char-rnn-tensorflow kz$ python sample.py -h
usage: sample.py [-h] [--save_dir SAVE_DIR] [-n N] [--prime PRIME]
                 [--sample SAMPLE]

(tensorflow) Nobodys-MacBook-Pro:save kz$ python ../sample.py --sample model.ckpkt-64149.meta
usage: sample.py [-h] [--save_dir SAVE_DIR] [-n N] [--prime PRIME]
                 [--sample SAMPLE]

weighted_pick() can return invalid index

weighted_pick(weights) in model.py can return an index which is larger than len(chars)-1

this happens if sum(weights)<1, and at the same time np.random.rand(1)>sum(weights)

then, int(np.searchsorted(t, np.random.rand(1)*s) )==len(t), which leads to an IndexError

import numpy as np
p=np.array([ 0.1, 0.2, 0.699 ], dtype=np.float32)
t = np.cumsum(p)
s = np.sum(p)
randval=0.9999
print int(np.searchsorted(t, randval)) # gives 3, which is too large, as len(t)==3

so probably numpy.random.choice() is the better choice, despite being slower

.

create_batches in TextLoader in utils.py doesn't seem to transform the data into batches correctly

The following lines transform the xdata to tensors with the correct dimensions, but the output data are not in the correct order anymore.
self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)

I think the correct transformation should be the following:
self.x_batches = xdata.reshape(-1, self.batch_size, self.seq_length)
self.y_batches = ydata.reshape(-1, self.batch_size, self.seq_length)

Here is an example:

xdata = np.array(range(100))
xdata => array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

batch_size = 5
seq_length = 5
num_batches = 4

m = np.split(xdata.reshape(batch_size, -1), num_batches, 1)

m => [array([[ 0, 1, 2, 3, 4],
[20, 21, 22, 23, 24],
[40, 41, 42, 43, 44],
[60, 61, 62, 63, 64],
[80, 81, 82, 83, 84]]), array([[ 5, 6, 7, 8, 9],
[25, 26, 27, 28, 29],
[45, 46, 47, 48, 49],
[65, 66, 67, 68, 69],
[85, 86, 87, 88, 89]]), array([[10, 11, 12, 13, 14],
[30, 31, 32, 33, 34],
[50, 51, 52, 53, 54],
[70, 71, 72, 73, 74],
[90, 91, 92, 93, 94]]), array([[15, 16, 17, 18, 19],
[35, 36, 37, 38, 39],
[55, 56, 57, 58, 59],
[75, 76, 77, 78, 79],
[95, 96, 97, 98, 99]])]

and

n = xdata.reshape(-1, batch_size, seq_length)
n => array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]],

   [[25, 26, 27, 28, 29],
    [30, 31, 32, 33, 34],
    [35, 36, 37, 38, 39],
    [40, 41, 42, 43, 44],
    [45, 46, 47, 48, 49]],

   [[50, 51, 52, 53, 54],
    [55, 56, 57, 58, 59],
    [60, 61, 62, 63, 64],
    [65, 66, 67, 68, 69],
    [70, 71, 72, 73, 74]],

   [[75, 76, 77, 78, 79],
    [80, 81, 82, 83, 84],
    [85, 86, 87, 88, 89],
    [90, 91, 92, 93, 94],
    [95, 96, 97, 98, 99]]])

MemoryError

When training on large files, I get a MemoryError despite having more than enough memory to hold the file:

reading text file
Traceback (most recent call last):
File "train.py", line 111, in
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 51, in train
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
File "/home/ren/Projects/char-rnn-tensorflow/utils.py", line 18, in init
self.preprocess(input_file, vocab_file, tensor_file)
File "/home/ren/Projects/char-rnn-tensorflow/utils.py", line 35, in preprocess
self.tensor = np.array(list(map(self.vocab.get, data)))
MemoryError

No progress in learning

I am trying to build my network with 21M text file, but whatever I do, it gets stuck at train_loss ~1.6 and does not progress any more. I tried changing:

network size (from 64 to 1024)
number of layers (2 and 3)
batch_size (from 10 to 100)
sequence length (from 10 to 1000)

But nothing helps and I always get my network to stop learning and stuck at about 1.6-1.7 train_loss.
How can I diagnose the problem? Can someone advise?

Results quite unsatisfactory

First and foremost thanks to everybody involved in this. I really appreciate the work you are putting into this.

Previously I was using Karpathy's char-rnn but I couldn't get torch running with my gpu after updating my hardware so I was looking for a different solution and that has brought me here. Using Karpathy's rnn I was getting beautiful results with even very small datasets (around 1MB). With your tensorflow implementation the results are not so good and I wonder why. I tried fiddle around with the parameters (rnn_size, num_layers, etc) but the improvements were little or nonexistent.

It would be really cool if you could add some explanatory comments to the different parameters aka how they will affect the result. For me being relatively new to NNs, this would help a lot in getting better results.

Thanks again for your efforts!

IndexError is out of Bounce

loading preprocessed files
Traceback (most recent call last):
File "train.py", line 75, in
main()
File "train.py", line 39, in main
train(args)
File "train.py", line 42, in train
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
File "/home/onetipp/char-rnn/char-rnn-tensorflow/utils.py", line 22, in init
self.create_batches()
File "/home/onetipp/char-rnn/char-rnn-tensorflow/utils.py", line 52, in create_batches
ydata[-1] = xdata[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

create_batches() from utils.py throws error

root@ip-172-31-23-174:/home/ubuntu/char-rnn# python train.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
loading preprocessed files
Traceback (most recent call last):
File "train.py", line 75, in
main()
File "train.py", line 39, in main
train(args)
File "train.py", line 42, in train
data_loader = TextLoader(args.data_dir, args.batch_size, args.seq_length)
File "/home/ubuntu/char-rnn/utils.py", line 22, in init
self.create_batches()
File "/home/ubuntu/char-rnn/utils.py", line 52, in create_batches
ydata[-1] = xdata[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

How to calculate prob of a new sentence.

My goal is to predict a probability of a new sentence, could you give a example of hot to calculate the probability of a new sentence?

Tuning the temperature

Is there a way to tune the temperature parameter ?

Temperature. An important parameter you may want to play with is -temperature, which takes a number in range (0, 1] (0 not included), default = 1. The temperature is dividing the predicted log probabilities before the Softmax, so lower temperature will cause the model to make more likely, but also more boring and conservative predictions. Higher temperatures cause the model to take more chances and increase diversity of results, but at a cost of more mistakes.
https://github.com/karpathy/char-rnn

feed data format

can you post an example of the exact data you are feeding into the placeholders.

Compat issues with latest TF

After building from HEAD of TF, I get some import errors for various things (almost every example on the web has this problem!)

I had to change imports in model.py to

from tensorflow.python.ops import rnn_cell
from tensorflow.python.ops import seq2seq

Downside is that importing tensorflow.models.rnn raises ImportError, so something like this is probably needed to preserve backwards compat:

try:
    from tensorflow.models.rnn import blah
except ImportError:
    from tensorflow.python.ops import blah

Not a big deal but your output files are not part of .gitignore

These show as modified:

data/tinyshakespeare/data.npy
data/tinyshakespeare/vocab.pkl
model.pyc
sample.pyc
train.pyc
utils.pyc

Can not convert a list into a Tensor or Operation

Using python 2.7 & the following libraries & the latest 'master' branch. This leads to the error below (which doesn't happen when I roll back to the earlier branch:ae-rnn)

(venv)root@# python train.py
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally

....

File "train.py", line 64, in train
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
File "/usr/share/venv/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 330, in run
% (subfetch, fetch, type(subfetch), str(e)))
TypeError: Fetch argument [<tf.Tensor 'zeros:0' shape=(?, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_1:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_2:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_3:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_4:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_5:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_6:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_7:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_8:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_9:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_10:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_11:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_12:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_13:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_14:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_15:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_16:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_17:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_18:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_19:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_20:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_21:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_22:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_23:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_24:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_25:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_26:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_27:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_28:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_29:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_30:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_31:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_32:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_33:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_34:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_35:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_36:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_37:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_38:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_39:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_40:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_41:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_42:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_43:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_44:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_45:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_46:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_47:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_48:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_49:0' shape=(50, 512) dtype=float32>] of [<tf.Tensor 'zeros:0' shape=(?, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_1:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_2:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_3:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_4:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_5:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_6:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_7:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_8:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_9:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_10:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_11:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_12:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_13:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_14:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_15:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_16:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_17:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_18:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_19:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_20:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_21:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_22:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_23:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_24:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_25:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_26:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_27:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_28:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_29:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_30:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_31:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_32:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_33:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_34:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_35:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_36:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_37:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_38:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_39:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_40:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_41:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_42:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_43:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_44:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_45:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_46:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_47:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_48:0' shape=(50, 512) dtype=float32>, <tf.Tensor 'rnnlm_1/concat_49:0' shape=(50, 512) dtype=float32>] has invalid type <type 'list'>, must be a string or Tensor. (Can not convert a list into a Tensor or Operation.)

Output activations of individual neurons of a given layer of a trained model.

I want to see what the activations are for individual neurons of a given layer for a given input character.

Any suggestions?

Char sequence probability

Hi,
I'm using char-rnn for computing sentence probability, which is main functionality of language modeling. This piece of code feeds sentence chars one by one and finds out probability of correctly predicting next char:

state = self.cell.zero_state(1, tf.float32).eval(session=session)
char_probas = []
input = np.zeros((1, 1))
for c, char in enumerate(sentence[:-1]):
    input[0, 0] = vocab[char]
    feed = {self.input_data: input, self.initial_state: state}
    [probs, state] = session.run([self.probs, self.final_state], feed)
    char_probas.append(probs[0][vocab[sentence[c+1]]])
probability = np.mean(char_probas)

It works fine and prefers well written sentences from sick ones. But I think it's not optimized for performance. Is it possible to feed one sequence of chars and receive generation probabilities for each one of them given previous chars? Currently, it seems that data transfer between host and device is a major bottleneck.

The inputs of dynamic_rnn must be 3-D tensor?

hi,
I want to use dynamic_rnn to train my convLSTM, the original data should be videos with dimension: [batch_size, max_time_step, high, width,channel]. But i failed to feed the data to dynamic_rnn.
I get such error:
ValueError: Dimension must be 5 but is 3 for 'transpose' (op: 'Transpose') with input shapes: [16,?,11,40,1], [3].
what should i do to use dynamic rnn?
version: tf 1.0
thanks

Why is `embeddings` in `model.py` random, instead of one-hot?

Hello,

If I understand correctly, the tridimensional inputs tensor is built by looking up the n-th row of embeddings for each number in the bidimensional self.input_data tensor. The rows of embeddings have the same size as the RNN's internal layers. This seems to be the way to input the different characters to the network.

The Tensorflow variable "embeddings" has nothing assigned to it explicitly, therefore it is drawn from a uniform distribution each time train.py is ran. Why is that? I would have expected embeddings to be a matrix of one-hot row vectors, encoding the different characters; and having that mapped to the internal layer by weights as in https://gist.github.com/karpathy/d4dee566867f8291f086 .

Also, printing embeddings at the end of every run, I notice that its value changes every time.

I would be very grateful if someone would explain to me what is going on here.

Yours truly,
rhaps0dy

Unable to test with GRU or RNN cells

I am on tensorflow 1.0, however it failed on 0.12 too.

Traceback (most recent call last):
  File "train.py", line 114, in <module>
    main()
  File "train.py", line 48, in main
    train(args)
  File "train.py", line 98, in train
    for i, (c, h) in enumerate(model.initial_state):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 516, in __iter__
    raise TypeError("'Tensor' object is not iterable.")
TypeError: 'Tensor' object is not iterable.

Regards.

Attempting to load from checkpoint gives unicode error.

Traceback (most recent call last):
  File "train.py", line 114, in <module>
    main()
  File "train.py", line 48, in main
    train(args)
  File "train.py", line 66, in train
    saved_model_args = cPickle.load(f)
  File "C:\Users\User\AppData\Local\Programs\Python\Python35\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 25: character maps to <undefined>

I'd stopped a run with ctrl+c and I assumed the checkpoint would work to restart it. Instead I get the above error. Running TF 0.12 on Windows 10 with CUDA 8 and cuDNN, Python v3.5.2.

[Deprecation Warning] Using a concatenated state is slower and will soon be deprecated.

I am receiving a deprecation warning when I run train.py or sample.py: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True. Is there a workaround for this?

Thanks,
CJ

how to change output format?

Output produced is in paragraph form, i want to have it in dialogues form separated by line. how to do so?

W tensorflow/core/framework/op_kernel.cc:909] Resource exhausted:OOM when allocating tensor with shape[1250,25670]

I am getting error while running train.py
Traceback (most recent call last):
file "train.py", line 111,in
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 98, in train
train_loss, state, _= sess.run([model.cost,model.final_state,model.train_op],feed)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string,options,run_metadata)

Interface to save checkpoint?

The interface has been changed from char-rnn, so I didn't set the save checkpoint correctly. I've got a larger nn size started with tensor-rnn and it looks like it won't do a checkpoint till 50 epochs.

It's currently on 36 and is calculating 2 epochs per day (250 buffer), ie another week.

Looking at the learning rate it would have been great to save a checkpoint "on demand" i.e. via keyboard command at certain points, 18 epochs, being the point where the training leveled of, in this case.

load_preprocessed error - TypeError: unhashable type: 'dict'

I got this error when trying to train on a preprocessed input.

The error comes from here: https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/utils.py#L42

self.chars is a tuple with (1) a list of chars and (2) a dict of char to integer mapping. I changed the function to just get it running:

def load_preprocessed(self, vocab_file, tensor_file):
    with open(vocab_file) as f:
        self.chars = cPickle.load(f)[0][0]
    print self.chars
    self.vocab_size = len(self.chars)
    self.vocab = dict(zip(self.chars, range(len(self.chars))))
    self.tensor = np.load(tensor_file)
    self.num_batches = self.tensor.size / (self.batch_size * self.seq_length)

Continue train

Hi, sorry for lame question. How i should run subsequent (continue from last point) trains?
And which files should i save? is it enough 'data' and 'save' directories?
Can't test by myself, due to space limit in aws.

Why we need ydata[-1] = xdata[0]?

In def create_batches(self) (util.py):

        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]

The first line is fair enough. However, why we need the second line? Say our data is "Hello", then

x = "hello"
y="elloh"

So when h is given we expect e (h->e), e->l, etc. But why o->h (ydata[-1] = xdata[0])? Perhaps this hurts the training model.

Did I miss something here? Or you think this is only one char, so we ignore?

No validation/test?

Hi,

Looks like your code doesn't have validation or testing part. Hope it can use some fraction of input.txt for valid/test purpose. Any plan?

loop function

Hi, I was wondering if someone could confirm my suspicion. I think this code in model.py is not ever used with the way sampling is done currently.

        def loop(prev, _):
            prev = tf.matmul(prev, softmax_w) + softmax_b
            prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
            return tf.nn.embedding_lookup(embedding, prev_symbol)

        outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm')

When I change to this, training and sampling seems to work fine

        # def loop(prev, _):
        #     prev = tf.matmul(prev, softmax_w) + softmax_b
        #     prev_symbol = tf.stop_gradient(tf.argmax(prev, 1))
        #     return tf.nn.embedding_lookup(embedding, prev_symbol)

        outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, scope='rnnlm')

Looking at the source for seq2seq.rnn_decoder, if input has length 1 (which it does when infer == True), the loop function is never used. Am I missing something? It almost looks like this code could replicate this paper.

Support Unicode Input Files

Python opens files as bytes by default. It would be nice if TextLoader.preprocess read the file and encoded it as unicode.

I'll submit a PR.

Thanks.

Add attention to the model

What if I'd like to use attention_decoder instead of rnn_decoder?

I wonder how to modify outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm').

What should attention_states be?

Error while trying to start your software

loading preprocessed files
Traceback (most recent call last):
File "train.py", line 75, in
main()
File "train.py", line 39, in main
train(args)
File "train.py", line 50, in train
model = Model(args)
File "/home/tensorflow/tests/char-lstm-16/model.py", line 53, in init
self.final_state = states[-1]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 126, in _SliceHelper
raise NotImplementedError("Negative indices are currently unsupported")
NotImplementedError: Negative indices are currently unsupported

How to reduce GPU memory?

What a wonderful project! I have used it to solve some problems.
But there is one problem that always bothers me.

In one of the cases, I have to use rnn_size=512, num_layers=2, seq_length=1200.
Other arguments: batch_size=10, num_epochs=50, grad_clip=5.0, and so on.
But it will allocate 7.23GiB in GPU, which is only 8GB-free.
So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU.
rnn_size, num_layers, seq_length cannot be modified.

Here is some of the ouputs.

I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 7967745639
InUse: 7764832256
MaxInUse: 7764842496
NumAllocs: 60834
MaxAllocSize: 14428160

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048]
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Sorry for my poor English, and thanks a lot!

Cannot train: 'tuple' object has no attribute 'eval'

Hi,

I tried this implementation of char-rnn and I have an issue with the train script:

Traceback (most recent call last):
File "train.py", line 111, in
main()
File "train.py", line 48, in main
train(args)
File "train.py", line 93, in train
state = model.initial_state.eval()
AttributeError: 'tuple' object has no attribute 'eval'

I'm using the last version of tensorflow and Python 3.4.

Thanks

TypeError: init() got an unexpected keyword argument 'syntax'

C02QH2D7G8WM:char-rnn-tensorflow userone$ python train.py
Traceback (most recent call last):
File "train.py", line 3, in
import tensorflow as tf
File "/usr/local/lib/python2.7/site-packages/tensorflow/init.py", line 23, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/init.py", line 35, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 16, in
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in
from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 22, in
serialized_pb=_b('\n,tensorflow/core/framework/tensor_shape.proto\x12\ntensorflow"z\n\x10TensorShapeProto\x12-\n\x03\x64im\x18\x02 \x03(\x0b\x32 .tensorflow.TensorShapeProto.Dim\x12\x14\n\x0cunknown_rank\x18\x03 \x01(\x08\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01 \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tB/\n\x18org.tensorflow.frameworkB\x11TensorShapeProtosP\x01\x62\x06proto3')
TypeError: init() got an unexpected keyword argument 'syntax'

problem with --init_from

When I use --init_from=./save to restore my model, it starts to init a new model, is there anything wrong in my command?

python3 train.py --data_dir=./data/mydata --init_from=./save

2 train steps for a single batch?

Hi,

There are such lines of code at train.py file:
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
summ, train_loss, state, _ = sess.run([summaries, model.cost, model.final_state, model.train_op], feed)

Means, we run 2 train steps on a single batch. Why?

NotImplementedError: Negative indices are currently unsupported

Line 53 of model.py contains the code:
self.final_state = states[-1]

This throws the following exception. Tensorflow does not support Tensors with negative indices. (At least in the publicly available version.) What is the workaround? So many thanks.

File "/Library/Python/2.7/site-packages/tensorflow/python/ops/array_ops.py", line 124, in _SliceHelper
raise NotImplementedError("Negative indices are currently unsupported")
NotImplementedError: Negative indices are currently unsupported
Exception TypeError: TypeError("'NoneType' object is not callable",) in <function _remove at 0x101c9b488> ignored

module 'tensorflow.contrib.rnn' has no attribute

In my tensorflow version 0.12.1, I couldn't use "from tensorflow.contrib import rnn" and "from tensorflow.nn import legacy_seq2seq".
they were moved.
So I used " tf.nn.rnn_cell" and "tf.nn.seq2seq" instead of above.