Giter Site home page Giter Site logo

rakeshvar / rnn_ctc Goto Github PK

View Code? Open in Web Editor NEW
220.0 27.0 83.0 618 KB

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

License: Apache License 2.0

Python 100.00%
recurrent-neural-networks python theano ctc rnn rnn-ctc ctc-loss neural-network ocr speech-recognition

rnn_ctc's Introduction

RNN CTC

Recurrent Neural Network with Connectionist Temporal Classifical implemented in Theano. Includes toy training examples.

Use

The goal of this problem is to train a Neural Network (with recurrent connections) to learn to read sequences. As a part of the training we show it a series of such sequences (tablets of text in our examples) and also tell it what the tablet contains (the labels of the written characters).

Methodology

We need to keep feeding our RNN the samples of text in two forms (written and labelled). If you have your own written samples you can train our system the offline way. If you have a scribe that can generate samples as you go, you can train one sample at a time, the online way.

Specifying parameters

You will need to specify a lot of parameters. Here is a overview. The file configs/default.ast has all the parameters specified (as a python dictionary), so compare that with these instructions.

  • Data Generation (cf. configs/alphabets.ast)

    • Scribe (The class that generates the samples)
      • alphabet: 'ascii_alphabet' (0-9a-zA-Z etc.) or 'hindu_alphabet' (0-9 hindu numerals)
      • noise: Amount of noise in the image
      • vbuffer, hbuffer: horizontal and vertical buffers
      • avg_seq_len: Average length of the tablet
      • varying_len: (bool) Make the length random
      • nchars_per_sample: This will make each tablet have the same number of characters. This over-rides avg_seq_len.
    • num_samples
  • Training (cf. configs/default.ast)

    • num_epochs
      • Offline case: Goes over the same data num_epochs times.
      • Online case: Each epoch has different data, resulting in generating a total of num_epochs * num_samples unique data samples!
    • train_on_fraction
      • Offline case: Fraction of samples that are used as training data
  • Neural Network (cf. configs/midlayer.ast and configs/optimizers.ast)

    • use_log_space: Perform calculations via the logarithms of probabilities.
    • mid_layer: The middle layer to be used. See the nnet/layers module for all the options you have.
    • mid_layer_args: The arguments needed for the middle layer. Depends on the mid_layer. See the constructor of the corresponding mid_layer class.
    • optimizer: The optimization algorithm to be used. sgd, adagrad, rmsprop, adadelta etc.
    • optimzier_args: The arguments that the optimizer needs. See the corresponding function in the file nnet/updates.py. Note: This should not contain the learning rate.
    • learning_rate_args:
      • initial_rate: Initial learning rate.
      • anneal:
        • constant: Learning rate will be kept constant
        • inverse: Will decay as the inverse of the epoch.
        • inverse_sqrt: Will decay as the inverse of the square root of the epoch.
      • epochs_to_half: Rate at which the learning_rate is annealed. Higher number means slower rate.

Usage

Offline Training

For this you need to generate data first and then train it using train_offline.py.

Generate Data

You can use hindu numerals or the entire ascii set, specified via an ast file.

python3 gen_data.py <output_name.pkl> [config=configs/default.ast]*
Train Network

You can train on the generated pickle file as:

python3 train_offline.py data.pkl [config=configs/default.ast]*

Online Training

You can generate and train simultaneously as:

python3 train_online.py [config=configs/default.ast]*

Examples

All the programs mentioned above can take multiple config files, later files override former ones. configs/default.ast is loaded by default.

Offline

# First generate the ast files based on given examples then...
python3 gen_data.py hindu_avg_len_60.py configs/hindu.ast configs/len_60.ast
python3 train_offline.py hindu_3chars.py configs/adagrad.ast configs/bilstm.ast configs/ilr.01.ast

Online

python3 train_online.py configs/hindu.ast configs/adagrad.ast configs/bilstm.ast configs/ilr.01.ast

Working Example

# Offline
python3 gen_data.py hindu3.py configs/working_eg.ast
python3 train_offline.py hindu3.py configs/working_eg.ast
# Online
python3 train_online.py configs/working_eg.ast

#Offline

Sample Output

# Using data from scribe.py hindu
Shown : 0 2 2 5 
Seen  : 0 2 2 5 
Images (Shown & Seen) : 

 0¦                            ¦
 1¦          ██  ██            ¦
 2¦         █  ██  ████        ¦
 3¦           █   █ █          ¦
 4¦      ██  █   █  ███        ¦
 5¦     █  █████████  █        ¦
 6¦     █  █        █ █        ¦
 7¦      ██         ███        ¦
 
 0¦░░░░░░░░░█░░░░░░░░░░░░░░░░░░¦
 1¦░░░░░░░░░░░░░░░░░░░░░░░░░░░░¦
 2¦░░░░░░░░░░░░░█░░░█░░░░░░░░░░¦
 3¦░░░░░░░░░░░░░░░░░░░░░░░░░░░░¦
 4¦░░░░░░░░░░░░░░░░░░░░░░░░░░░░¦
 5¦░░░░░░░░░░░░░░░░░░░█▓░░░░░░░¦
 6¦█████████░███░███░█░▒███████¦

References

  • Graves, Alex. Supervised Sequence Labelling with Recurrent Neural Networks. Chapters 2, 3, 7 and 9.
  • Available at Springer
  • University Edition via. Springer Link.
  • Free Preprint

Credits

Dependencies

  • Numpy
  • Theano

Can easily port to python2 by adding lines like these where necessary. In the interest of the future generations, we highly recommend you do not do that.

from __future__ import print_function

rnn_ctc's People

Contributors

rakeshvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rnn_ctc's Issues

saving the trained model for future use -

Thanks for posting your code, I played with it to understand LSTM implementation. I just have a quick question regarding saving the models for later use. Right now, when I run train.py, I can see the model getting trained and I see some outputs. But is there a way to save the model to file and later use it to retrain/predict on future data? I tried using pickle but I get an error saying maximum recursion reached. Please post your thoughts.

Thanks!

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2588' in position 0: ordinal not in range(256)

Hello, I ran 'python3 gen_data.py data.pkl' and had error messages as follow:

(0, 1, 2)  !"
 0¦              Traceback (most recent call last):
  File "gen_data.py", line 27, in <module>
    utils.slab_print(x)
  File "/home/speech/wudan14/rnn_ctc-master/utils.py", line 20, in slab_print
    elif val <= 1.:  print('\u2588', end=''),
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2588' in position 0: ordinal not in range(256)

how can i solve this problem?

float32 failure

float32 does not work.

$python train_online.py 
Arguments:
FloatX         : float32
Num Epochs     : 1000
Num Samples    : 1000
Scribe:
  Alphabet:  !"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  Noise: 0.05
  Buffers (vert, horz): 5, 3
  Characters per sample: Depends on the random length
  Length: Avg:60 Range:(45, 75)
  Height: 11
Building the Network
Traceback (most recent call last):
  File "train_online.py", line 26, in <module>
    ntwk = nn.NeuralNet(scriber.nDims, scriber.nClasses, **nnet_args)
  File "/home/rakesha/rnn_ctcs/rnn_ctc/nnet/neuralnet.py", line 23, in __init__
    layer3 = CTCLayer(layer2.output, labels, n_classes, use_log_space)
  File "/home/rakesha/rnn_ctcs/rnn_ctc/nnet/ctc.py", line 64, in __init__
    self._log_ctc()
  File "/home/rakesha/rnn_ctcs/rnn_ctc/nnet/ctc.py", line 115, in _log_ctc
    outputs_info=[safe_log(_1000)]
  File "/home/rakesha/.local/lib/python3.3/site-packages/Theano-0.7.0-py3.3.egg/theano/scan_module/scan.py", line 1044, in scan
    scan_outs = local_op(*scan_inputs)
  File "/home/rakesha/.local/lib/python3.3/site-packages/Theano-0.7.0-py3.3.egg/theano/gof/op.py", line 600, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/rakesha/.local/lib/python3.3/site-packages/Theano-0.7.0-py3.3.egg/theano/scan_module/scan_op.py", line 550, in make_node
    inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (`fn`) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

float32 vs float64 on GPU

Using gpu device 0: GeForce GTX TITAN Black

Config: 0
   Midlayer: <class 'reccurent.RecurrentLayer'> {'nunits': 9}
Input Dim: 8
Num Classes: 10
Num Samples: 1000
FloatX: float32
Using log space: True

Preparing the Data
Building the Network
Traceback (most recent call last):
  File "train.py", line 115, in <module>
    ntwk = NeuralNet(nDims, nClasses, midlayer, midlayerargs, log_space)
  File "/home/rakesha/rnn_ctc/neuralnet.py", line 16, in __init__
    layer3 = CTCLayer(layer2.output, labels, n_classes, logspace)
  File "/home/rakesha/rnn_ctc/ctc.py", line 68, in __init__
    self.log_ctc()
  File "/home/rakesha/rnn_ctc/ctc.py", line 117, in log_ctc
    outputs_info=[safe_log(_1000)]
  File "/home/rakesha/.virtualenvs/python3.4/lib/python3.4/site-packages/theano/scan_module/scan.py", line 1017, in scan
    scan_outs = local_op(*scan_inputs)
  File "/home/rakesha/.virtualenvs/python3.4/lib/python3.4/site-packages/theano/gof/op.py", line 481, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/rakesha/.virtualenvs/python3.4/lib/python3.4/site-packages/theano/scan_module/scan_op.py", line 339, in make_node
    inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (`fn`) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

is the ctc implementation gpu runnable?

@rakeshvar I was wondering if this ctc implementation is gpu runnable. Do you know if there are any such implementations in Theano/Keras. Except for warp_ctc in Torch I could not find any such implementations which run on GPUUs

hi,I have a question for ask

When i use the code, it perform very nice on the data set "hindu" with the default configuration , but the result on the "ascii" is very bad. I think it's wrong with my network config. Can you tell me what' the config on the data set of "ascii".thank you.

Long time (over 300 frames) problem in speech recognition (in TIMIT data)

Hello!

We have a long time (over 300 frames) problem in speech recognition (in TIMIT data).

In general, speech recognition used a feature data with long time, for example 300 frames for 3 second utterance. When we analyzed your code in 'ctc.py' scan function, it seems to be calculated as zero in probabilities variable for over 300 frames. And 'cost' variable showed as 'Inf'.

How can we treat the problem?
Do you have any suggestions?

I will wait your comments.

Best regards.

Better GPU support.

Currently training is slower on GPUs than on CPUs, because the training data is not a shared variable.

tiny bug in readme.md!

python3 rnn_ctc.py data.pkl [configuration_num]

See configurations.py for various configurations.


shoud be ctc.py , not rnn_ctc.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.