igormq / ctc_tensorflow_example Goto Github PK

View Code? Open in Web Editor NEW

313.0 313.0 184.0 27 KB

CTC + Tensorflow Example for ASR

License: MIT License

Python 100.00%

ctc_tensorflow_example's People

Contributors

Stargazers

Watchers

Forkers

synckey nightinwhite lifeizhuhai arassem diggerdu jfsantos apollo-time chenxinglili shinchen gtkafka sun-peach szhaomsft kodavatimahendra eurismarpires koomook alphalfc carloslema xiao2mo joshchartier g0josh abrari meijunjie devinbostil scottgigante ankurjain10 miffy1216 moyifeng fakhraddin srviest chenjiasheng yangxiaokang vanova flower-with-safe cdyangbo jp-myk mbencherif saurabhvyas skyninefive tikyau afternoonzhou edisonguo xxbb1234021 manucarbonell lakbychance drdrsh birdomi sakhawatsumit bharathh4 zxhaijm takingstock gaoyiyeah ykal kochsnow harshadeepg dongming hiyoung-asr iheo sailisanke afcarl linzehua zhengqun sangkwun yanjun-zh binhnq94 kovarsky shivamagrawal2014 zacrash chaseliutb hosikchoi lilixinsniper oswaldoludwig wuhaodemo xudongxiang jbgh2 shehuikun aakankshaduggal esdsnqxz dotrado xrosliang xrick zhoushaojun claint76 robotseye weimingtom archerzjt iamweiweishi diegosiqueir4 shibbirtanvin bpppppp jieli4970

ctc_tensorflow_example's Issues

num_features

Why are you taking num_features = 13?
How do you come up with this number 13, Is there a reason behind it?
Please help me understand this

Hi, I tried to run the default example.py and encountered this error. I also tried to download the data set from the link directly but also failed. Any idea how I can fix this? Thx!
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

label error rate or label correct rate ?

I see this comment https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L145.
I wonder tf.reduce_mean(tf.edit_distance(tf.cast(decoded[0], tf.int32),targets)) should be correct rate,so it just be clerical error or something else?

By the way,the code you give me works very well on SVHN dataset when I added Dropout and one more( now is 2 layers) layer . Amazing!

Redundant Loop ?

# Readings targets
with open(target_filename, 'r') as f:
    for line in f.readlines():
        if line[0] == ';':
            continue

        # Get only the words between [a-z] and replace period for none
        original = ' '.join(line.strip().lower().split(' ')[2:]).replace('.', '')
        targets = original.replace(' ', '  ')
        targets = targets.split(' ')

Excuse my ignorance, but isn't this loop discarding everything except the last assigned value (last line in f) to targets ?

Input to sparse_tuple_from?

Could you please elaborate on what is the shape of the target (sequences) that we are feeding to the function sparse_tuple_from?
Are they the output label sequences (whose length is not equal to the sequence length) for each in sequence in the batch?

BLSTM

Hi,
Thanks a lot for this great and very useful work. Would you have any input on how to modify the multidata example to get a bidirectional LSTM, with two layers (like here : https://github.com/xisnu/CNN-BLSTM-CTC/blob/master/Images/hybrid.jpg)?

when I run ctc_greedy_decoder function,how can I get the align information from all frames probability .

As we kown,the ctc_greedy_decoder function can return a sequence that automatically eliminate the blank label and merge repeat labels. Now,I want get the message that contains complete greedy path information and probability before eliminate the blank label and merge repeat labels.How can i get this information,thanks

CTC on multiple training example

In your program , you have one wav file and you extract the MFCC feature 13 dimension => train_inputs. Second, you construct a label array like this [19 8 5 ...] and change it to Sparse representation ( function : sparse_tuple_from ) . I do not know why should change label into Sparse representation.

In my case, I extract MFCC feature 14 dimension , but I have 8440 training data. I do not know how could I create an array to save all of feature because each frame is different of different file, Please help me tks.

I like your example for your ctc neural, tks you give us an useful code.

I want to handle more file,how to prepare the inputs and targets?

thanks!

Multilayer problem

When I change the num_layers to some other value (tried 2, 3 and 4). It gives a Value Error. The error is as follows..... ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/lstm_cell/kernel, but specified shape (100, 200) and found shape (63, 200).

This error occurred on line 110 of ctc_tensorflow_example.py
Am I doing something wrong or is there any other value that needs to be changed. Please help

Normalization for multiple WAV inputs

I'm modifying your script for multiple data to use real WAV files instead of generating fake data. I want to know how to properly z-normalize the features. Currently I'm doing this:

def prepare_training_inputs(wav_filenames):
    inputs = []
    for wav in wav_filenames:
        mfcc = utils.wav_mfcc(wav)
        mfcc = (mfcc - np.mean(mfcc)) / np.std(mfcc)
        inputs.append(mfcc)
    train_inputs = np.asarray(inputs)
    return train_inputs

Is this the correct approach? Thank you.

other way for increasing Batch size

Other than padding, Is there any way to increase Batch Size?

This project is not suitable for Tensorflow newst versions, like 0.12.1 and 1.0.0

This project is not suitable for Tensorflow newst versions, like 0.12.1 and 1.0.0.

Regarding the output nodes of the model

Greetings, I am studying the example provided, along with tutorials, however it still isn't clear to me where the output nodes are defined. For example in the specific case of preparing a model for inference, what should I write in the output_node_names when freezing the model?

OCR: clarification about input and output

I'm trying to solve OCR tasks based on this code.

So what shape input to LSTM should have, suppose we have images [batch_size, height, width, channels] how should they be reshaped to be used as input? Like [batch_size, width, height*channels], so width is like time dimension?

What if I want to have variable width? As I understand size of sequences in batch should be the same (common trick just to use padding by zeros at the end of sequence?) or batch_size should be 1)

What if I want to have variable width and height? As I understand I need to use convolutional + global average pooling / spartial pyramid pooling layers before input to LSTM, so output blob will be [batch_size, feature_map_height, feature_map_width, feature_map_channels], how should blob be reshaped to be used as input to LSTM? Like [batch_size, feature_map_width, feature_map_height*feature_map_channels] ? Can we reshape it just to single row like [batch_size, feature_map_width*feature_map_height*feature_map_channels] it will be like sequence of pixels and we loose some spartial information, will it work?

Here is definition of input, but I'm not sure what it's mean in your case [batch_size, max_stepsize, num_features]:
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L90

And how output of LSTM depends on input size and max sequence length?
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L110

BTW: Here is some examples using 'standard' approaches in Keras+Tensorflow which I want to complement with RNN examples.
https://github.com/mrgloom/Char-sequence-recognition

Save and restore problem

Thank you for this excellent example. I tried to implement my model which is similar to this. But I have been facing a problem during saving and restoring the model. When the training is going on everything is fine and weights are saved. But when I am stopping the iterations and then load the model with saved weights then CTC loss is showing some previous values and not the last one where it has been saved. Any help will be highly appreciated.

how to modify the connectionist Temporal Classification (CTC) layer of the network to also give us a confidence score?

[No issue] Thanks for this great work, it really helped in my work

Thank you very much!!

Bidirectional LSTM

Bidirectional LSTM with CTC is now live on the current version of Tensorflow
Could you please make an extension for ctc_tensorflow_multidata_example.py using Bidirectional LSTM?

I tried tweaking back and forth but failed miserably T__T

all possible alignments

As given in the Alex Graves CTC paper, we sum over probabilities of all the possible alignments using dynamic programming which gives us certain transcription. In code and in tensorflow documentation it is not explicitly mentioned that where this part is happening. Could you help me here and tell me in which of the following functions, this DP thing is happening:

tf.nn.ctc_loss
tf.nn.ctc_beam_search_decoder
tf.nn.ctc_greedy_decoder

Saw a non-null label issue

Thank you for this excellent example. I am trying to implement my model which is similar to this with the tensorflow 1.2. But there is something wrong:

InvalidArgumentError (see above for traceback): Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 218 labels: 58,191,215,196
[[Node: loss/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](fc/transpose/_163, _arg_Placeholder_3_0_3, _arg_Placeholder_2_0_2, _arg_Placeholder_4_0_4)]]
[[Node: loss/CTCLoss/_165 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4527_loss/CTCLoss", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

This is my code:
graph = tf.Graph()
with graph.as_default():
# feature inputs e.g. fbank, MFCC
# size : [batch_size, timesteps, feature_dim]
inputs = tf.placeholder(tf.float32, [None, None, FLAGS.n_mfcc])

    # use sparse_placeholder to generate a SparseTensor required by ctc_loss op
    targets = tf.sparse_placeholder(tf.int32, [None])
   .....
    # Time major
    logits = tf.transpose(logits, (1, 0, 2))
    tf.summary.histogram('fc', W)
    tf.summary.histogram('fc', b)

    with tf.name_scope('loss'):
        loss = tf.nn.ctc_loss(targets, logits, seq_len)
        
    with tf.name_scope('cost'):
        cost = tf.reduce_mean(loss)

....
for i, batch in
enumerate(datagen.iterate_train(FLAGS.batch_size, shuffle=shuffle,
sort_by_duration=sortagrad)):
train_inputs = batch['x']
train_targets = sparse_tuple_from(batch['y'])
train_seq_len = batch['label_lengths']

            feed = {inputs: train_inputs,
                    targets: train_targets,
                    seq_len: train_seq_len}
            batch_cost, _ = session.run([cost, optimizer], feed)
            train_cost += batch_cost*FLAGS.batch_size
            train_ler += session.run(ler, feed_dict=feed)*FLAGS.batch_size

The labels seems to be considered as one label together, how can I fix this issue?

Many thanks.

Can we compute the accuracy in the training?

Hello, @igormq , I have a question, can we compute the accuracy in training step for the each batch data.
I use your project's fork https://github.com/synckey/tensorflow_lstm_ctc_ocr , in this project , It want to process ocr problem. But I was confused with the accuracy compute, beacuse you can't compute the accuracy in every batch, in some batch, the output target's length will smaller than the label's length.
https://github.com/synckey/tensorflow_lstm_ctc_ocr/blob/master/lstm_and_ctc_ocr_train.py#L39

sth. about spare_tuple_from?

thanks for your work, i can't understand the function spare_tuple_from.

in utils.py line 67 shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1]+1], dtype=np.int64)

what does the ‘+1’ means?

Multiple Data

I want to train the model on a small timit dataset, how do I change the inputs for the same?
Also, I want the output to be text, rather than numbers.

multi layer LSTM gives error

Hi Igor .. thanks so much for the example but the multi layer LSTM layer gives an error about input sizes mismatching during MATMUL ..its got a solution here ..tensorflow/tensorflow#14897

..kindly update your code

lstm + ctc for mnist

Hi, igormq. It is very helpful to see your Blog talk about CTC on Tensorflow . Thank you a million. But I have some confusion about the CTC module.
1. If sequence is A B B * B * B( * is blank). tf.ctc.ctc_greedy_decoder() should return ABBB. But Doc. say result is A B if merge_repeated =True.
2 . My code is using LSTM to classify Mnist data . Just one layer and 28 timeSteps . But CTC_LOSS don't work at all. Can you help me define the right call style? The code is so simple and I promise U can get it when you see the code .
Thanks again.

Problems with training deep RNN and with multiple examples

Hi,

I'm currently trying to train the RNN with multiple test inputs, in what format will the 'train_targets' and 'train_inputs' variables have to be to successfully train the network? I thought of concatenating all the inputs but as the train_seq_len and number of targets per input varies I cannot concatenate the entire database.

Also if I increase the number of layers to more than 1 the training cost does not reduce sufficiently (still using only one example), is this because there is not enough data? If that is the case then shouldn't the RNN resort to over-fitting but still predict the correct output?

Regards,
Deepak