Giter Site home page Giter Site logo

ctc_tensorflow_example's People

Contributors

netom avatar scottgigante avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ctc_tensorflow_example's Issues

num_features

Why are you taking num_features = 13?
How do you come up with this number 13, Is there a reason behind it?
Please help me understand this

urle error

Hi, I tried to run the default example.py and encountered this error. I also tried to download the data set from the link directly but also failed. Any idea how I can fix this? Thx!
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

Redundant Loop ?

# Readings targets
with open(target_filename, 'r') as f:
    for line in f.readlines():
        if line[0] == ';':
            continue

        # Get only the words between [a-z] and replace period for none
        original = ' '.join(line.strip().lower().split(' ')[2:]).replace('.', '')
        targets = original.replace(' ', '  ')
        targets = targets.split(' ')

Excuse my ignorance, but isn't this loop discarding everything except the last assigned value (last line in f) to targets ?

Input to sparse_tuple_from?

Could you please elaborate on what is the shape of the target (sequences) that we are feeding to the function sparse_tuple_from?
Are they the output label sequences (whose length is not equal to the sequence length) for each in sequence in the batch?

CTC on multiple training example

In your program , you have one wav file and you extract the MFCC feature 13 dimension => train_inputs. Second, you construct a label array like this [19 8 5 ...] and change it to Sparse representation ( function : sparse_tuple_from ) . I do not know why should change label into Sparse representation.

In my case, I extract MFCC feature 14 dimension , but I have 8440 training data. I do not know how could I create an array to save all of feature because each frame is different of different file, Please help me tks.

I like your example for your ctc neural, tks you give us an useful code.

Multilayer problem

When I change the num_layers to some other value (tried 2, 3 and 4). It gives a Value Error. The error is as follows..... ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/lstm_cell/kernel, but specified shape (100, 200) and found shape (63, 200).

This error occurred on line 110 of ctc_tensorflow_example.py
Am I doing something wrong or is there any other value that needs to be changed. Please help

Normalization for multiple WAV inputs

I'm modifying your script for multiple data to use real WAV files instead of generating fake data. I want to know how to properly z-normalize the features. Currently I'm doing this:

def prepare_training_inputs(wav_filenames):
    inputs = []
    for wav in wav_filenames:
        mfcc = utils.wav_mfcc(wav)
        mfcc = (mfcc - np.mean(mfcc)) / np.std(mfcc)
        inputs.append(mfcc)
    train_inputs = np.asarray(inputs)
    return train_inputs

Is this the correct approach? Thank you.

Regarding the output nodes of the model

Greetings, I am studying the example provided, along with tutorials, however it still isn't clear to me where the output nodes are defined. For example in the specific case of preparing a model for inference, what should I write in the output_node_names when freezing the model?

OCR: clarification about input and output

I'm trying to solve OCR tasks based on this code.

So what shape input to LSTM should have, suppose we have images [batch_size, height, width, channels] how should they be reshaped to be used as input? Like [batch_size, width, height*channels], so width is like time dimension?

What if I want to have variable width? As I understand size of sequences in batch should be the same (common trick just to use padding by zeros at the end of sequence?) or batch_size should be 1)

What if I want to have variable width and height? As I understand I need to use convolutional + global average pooling / spartial pyramid pooling layers before input to LSTM, so output blob will be [batch_size, feature_map_height, feature_map_width, feature_map_channels], how should blob be reshaped to be used as input to LSTM? Like [batch_size, feature_map_width, feature_map_height*feature_map_channels] ? Can we reshape it just to single row like [batch_size, feature_map_width*feature_map_height*feature_map_channels] it will be like sequence of pixels and we loose some spartial information, will it work?

Here is definition of input, but I'm not sure what it's mean in your case [batch_size, max_stepsize, num_features]:
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L90

And how output of LSTM depends on input size and max sequence length?
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L110

BTW: Here is some examples using 'standard' approaches in Keras+Tensorflow which I want to complement with RNN examples.
https://github.com/mrgloom/Char-sequence-recognition

Save and restore problem

Thank you for this excellent example. I tried to implement my model which is similar to this. But I have been facing a problem during saving and restoring the model. When the training is going on everything is fine and weights are saved. But when I am stopping the iterations and then load the model with saved weights then CTC loss is showing some previous values and not the last one where it has been saved. Any help will be highly appreciated.

Bidirectional LSTM

Bidirectional LSTM with CTC is now live on the current version of Tensorflow
Could you please make an extension for ctc_tensorflow_multidata_example.py using Bidirectional LSTM?

I tried tweaking back and forth but failed miserably T__T

all possible alignments

As given in the Alex Graves CTC paper, we sum over probabilities of all the possible alignments using dynamic programming which gives us certain transcription. In code and in tensorflow documentation it is not explicitly mentioned that where this part is happening. Could you help me here and tell me in which of the following functions, this DP thing is happening:

  1. tf.nn.ctc_loss
  2. tf.nn.ctc_beam_search_decoder
  3. tf.nn.ctc_greedy_decoder

Saw a non-null label issue

Thank you for this excellent example. I am trying to implement my model which is similar to this with the tensorflow 1.2. But there is something wrong:

InvalidArgumentError (see above for traceback): Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 218 labels: 58,191,215,196
[[Node: loss/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](fc/transpose/_163, _arg_Placeholder_3_0_3, _arg_Placeholder_2_0_2, _arg_Placeholder_4_0_4)]]
[[Node: loss/CTCLoss/_165 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4527_loss/CTCLoss", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

This is my code:
graph = tf.Graph()
with graph.as_default():
# feature inputs e.g. fbank, MFCC
# size : [batch_size, timesteps, feature_dim]
inputs = tf.placeholder(tf.float32, [None, None, FLAGS.n_mfcc])

    # use sparse_placeholder to generate a SparseTensor required by ctc_loss op
    targets = tf.sparse_placeholder(tf.int32, [None])
   .....
    # Time major
    logits = tf.transpose(logits, (1, 0, 2))
    tf.summary.histogram('fc', W)
    tf.summary.histogram('fc', b)

    with tf.name_scope('loss'):
        loss = tf.nn.ctc_loss(targets, logits, seq_len)
        
    with tf.name_scope('cost'):
        cost = tf.reduce_mean(loss)

....
for i, batch in
enumerate(datagen.iterate_train(FLAGS.batch_size, shuffle=shuffle,
sort_by_duration=sortagrad)):
train_inputs = batch['x']
train_targets = sparse_tuple_from(batch['y'])
train_seq_len = batch['label_lengths']

            feed = {inputs: train_inputs,
                    targets: train_targets,
                    seq_len: train_seq_len}
            batch_cost, _ = session.run([cost, optimizer], feed)
            train_cost += batch_cost*FLAGS.batch_size
            train_ler += session.run(ler, feed_dict=feed)*FLAGS.batch_size

The labels seems to be considered as one label together, how can I fix this issue?

Many thanks.

Can we compute the accuracy in the training?

Hello, @igormq , I have a question, can we compute the accuracy in training step for the each batch data.
I use your project's fork https://github.com/synckey/tensorflow_lstm_ctc_ocr , in this project , It want to process ocr problem. But I was confused with the accuracy compute, beacuse you can't compute the accuracy in every batch, in some batch, the output target's length will smaller than the label's length.
https://github.com/synckey/tensorflow_lstm_ctc_ocr/blob/master/lstm_and_ctc_ocr_train.py#L39

sth. about spare_tuple_from?

thanks for your work, i can't understand the function spare_tuple_from.

in utils.py line 67 shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1]+1], dtype=np.int64)

what does the โ€˜+1โ€™ means?

Multiple Data

I want to train the model on a small timit dataset, how do I change the inputs for the same?
Also, I want the output to be text, rather than numbers.

lstm + ctc for mnist

Hi, igormq. It is very helpful to see your Blog talk about CTC on Tensorflow . Thank you a million. But I have some confusion about the CTC module.
1. If sequence is A B B * B * B( * is blank). tf.ctc.ctc_greedy_decoder() should return ABBB. But Doc. say result is A B if merge_repeated =True.
2 . My code is using LSTM to classify Mnist data . Just one layer and 28 timeSteps . But CTC_LOSS don't work at all. Can you help me define the right call style? The code is so simple and I promise U can get it when you see the code .
Thanks again.

Problems with training deep RNN and with multiple examples

Hi,

I'm currently trying to train the RNN with multiple test inputs, in what format will the 'train_targets' and 'train_inputs' variables have to be to successfully train the network? I thought of concatenating all the inputs but as the train_seq_len and number of targets per input varies I cannot concatenate the entire database.

Also if I increase the number of layers to more than 1 the training cost does not reduce sufficiently (still using only one example), is this because there is not enough data? If that is the case then shouldn't the RNN resort to over-fitting but still predict the correct output?

Regards,
Deepak

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.