igormq / ctc_tensorflow_example Goto Github PK
View Code? Open in Web Editor NEWCTC + Tensorflow Example for ASR
License: MIT License
CTC + Tensorflow Example for ASR
License: MIT License
Why are you taking num_features = 13?
How do you come up with this number 13, Is there a reason behind it?
Please help me understand this
Hi, I tried to run the default example.py and encountered this error. I also tried to download the data set from the link directly but also failed. Any idea how I can fix this? Thx!
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>
I see this comment https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L145.
I wonder tf.reduce_mean(tf.edit_distance(tf.cast(decoded[0], tf.int32),targets)) should be correct rate,so it just be clerical error or something else?
By the way,the code you give me works very well on SVHN dataset when I added Dropout and one more( now is 2 layers) layer . Amazing!
# Readings targets
with open(target_filename, 'r') as f:
for line in f.readlines():
if line[0] == ';':
continue
# Get only the words between [a-z] and replace period for none
original = ' '.join(line.strip().lower().split(' ')[2:]).replace('.', '')
targets = original.replace(' ', ' ')
targets = targets.split(' ')
Excuse my ignorance, but isn't this loop discarding everything except the last assigned value (last line in f) to targets ?
Could you please elaborate on what is the shape of the target (sequences) that we are feeding to the function sparse_tuple_from?
Are they the output label sequences (whose length is not equal to the sequence length) for each in sequence in the batch?
Hi,
Thanks a lot for this great and very useful work. Would you have any input on how to modify the multidata example to get a bidirectional LSTM, with two layers (like here : https://github.com/xisnu/CNN-BLSTM-CTC/blob/master/Images/hybrid.jpg)?
As we kown,the ctc_greedy_decoder function can return a sequence that automatically eliminate the blank label and merge repeat labels. Now,I want get the message that contains complete greedy path information and probability before eliminate the blank label and merge repeat labels.How can i get this information,thanks
In your program , you have one wav file and you extract the MFCC feature 13 dimension => train_inputs. Second, you construct a label array like this [19 8 5 ...] and change it to Sparse representation ( function : sparse_tuple_from ) . I do not know why should change label into Sparse representation.
In my case, I extract MFCC feature 14 dimension , but I have 8440 training data. I do not know how could I create an array to save all of feature because each frame is different of different file, Please help me tks.
I like your example for your ctc neural, tks you give us an useful code.
I want to handle more file,how to prepare the inputs and targets?
thanks!
When I change the num_layers to some other value (tried 2, 3 and 4). It gives a Value Error. The error is as follows..... ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/lstm_cell/kernel, but specified shape (100, 200) and found shape (63, 200).
This error occurred on line 110 of ctc_tensorflow_example.py
Am I doing something wrong or is there any other value that needs to be changed. Please help
I'm modifying your script for multiple data to use real WAV files instead of generating fake data. I want to know how to properly z-normalize the features. Currently I'm doing this:
def prepare_training_inputs(wav_filenames):
inputs = []
for wav in wav_filenames:
mfcc = utils.wav_mfcc(wav)
mfcc = (mfcc - np.mean(mfcc)) / np.std(mfcc)
inputs.append(mfcc)
train_inputs = np.asarray(inputs)
return train_inputs
Is this the correct approach? Thank you.
Other than padding, Is there any way to increase Batch Size?
This project is not suitable for Tensorflow newst versions, like 0.12.1 and 1.0.0.
Greetings, I am studying the example provided, along with tutorials, however it still isn't clear to me where the output nodes are defined. For example in the specific case of preparing a model for inference, what should I write in the output_node_names when freezing the model?
I'm trying to solve OCR tasks based on this code.
So what shape input to LSTM should have, suppose we have images [batch_size, height, width, channels]
how should they be reshaped to be used as input? Like [batch_size, width, height*channels]
, so width
is like time dimension
?
What if I want to have variable width? As I understand size of sequences in batch should be the same (common trick just to use padding by zeros at the end of sequence?) or batch_size
should be 1)
What if I want to have variable width and height? As I understand I need to use convolutional + global average pooling / spartial pyramid pooling layers before input to LSTM, so output blob will be [batch_size, feature_map_height, feature_map_width, feature_map_channels]
, how should blob be reshaped to be used as input to LSTM? Like [batch_size, feature_map_width, feature_map_height*feature_map_channels]
? Can we reshape it just to single row like [batch_size, feature_map_width*feature_map_height*feature_map_channels]
it will be like sequence of pixels and we loose some spartial information, will it work?
Here is definition of input, but I'm not sure what it's mean in your case [batch_size, max_stepsize, num_features]
:
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L90
And how output of LSTM depends on input size and max sequence length?
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L110
BTW: Here is some examples using 'standard' approaches in Keras+Tensorflow which I want to complement with RNN examples.
https://github.com/mrgloom/Char-sequence-recognition
Thank you for this excellent example. I tried to implement my model which is similar to this. But I have been facing a problem during saving and restoring the model. When the training is going on everything is fine and weights are saved. But when I am stopping the iterations and then load the model with saved weights then CTC loss is showing some previous values and not the last one where it has been saved. Any help will be highly appreciated.
how to modify the connectionist Temporal Classification (CTC) layer of the network to also give us a confidence score?
Thank you very much!!
Bidirectional LSTM with CTC is now live on the current version of Tensorflow
Could you please make an extension for ctc_tensorflow_multidata_example.py using Bidirectional LSTM?
I tried tweaking back and forth but failed miserably T__T
As given in the Alex Graves CTC paper, we sum over probabilities of all the possible alignments using dynamic programming which gives us certain transcription. In code and in tensorflow documentation it is not explicitly mentioned that where this part is happening. Could you help me here and tell me in which of the following functions, this DP thing is happening:
Thank you for this excellent example. I am trying to implement my model which is similar to this with the tensorflow 1.2. But there is something wrong:
InvalidArgumentError (see above for traceback): Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 218 labels: 58,191,215,196
[[Node: loss/CTCLoss = CTCLoss[ctc_merge_repeated=true, ignore_longer_outputs_than_inputs=false, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](fc/transpose/_163, _arg_Placeholder_3_0_3, _arg_Placeholder_2_0_2, _arg_Placeholder_4_0_4)]]
[[Node: loss/CTCLoss/_165 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4527_loss/CTCLoss", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]
This is my code:
graph = tf.Graph()
with graph.as_default():
# feature inputs e.g. fbank, MFCC
# size : [batch_size, timesteps, feature_dim]
inputs = tf.placeholder(tf.float32, [None, None, FLAGS.n_mfcc])
# use sparse_placeholder to generate a SparseTensor required by ctc_loss op
targets = tf.sparse_placeholder(tf.int32, [None])
.....
# Time major
logits = tf.transpose(logits, (1, 0, 2))
tf.summary.histogram('fc', W)
tf.summary.histogram('fc', b)
with tf.name_scope('loss'):
loss = tf.nn.ctc_loss(targets, logits, seq_len)
with tf.name_scope('cost'):
cost = tf.reduce_mean(loss)
....
for i, batch in
enumerate(datagen.iterate_train(FLAGS.batch_size, shuffle=shuffle,
sort_by_duration=sortagrad)):
train_inputs = batch['x']
train_targets = sparse_tuple_from(batch['y'])
train_seq_len = batch['label_lengths']
feed = {inputs: train_inputs,
targets: train_targets,
seq_len: train_seq_len}
batch_cost, _ = session.run([cost, optimizer], feed)
train_cost += batch_cost*FLAGS.batch_size
train_ler += session.run(ler, feed_dict=feed)*FLAGS.batch_size
The labels seems to be considered as one label together, how can I fix this issue?
Many thanks.
Hello, @igormq , I have a question, can we compute the accuracy in training step for the each batch data.
I use your project's fork https://github.com/synckey/tensorflow_lstm_ctc_ocr , in this project , It want to process ocr problem. But I was confused with the accuracy compute, beacuse you can't compute the accuracy in every batch, in some batch, the output target's length will smaller than the label's length.
https://github.com/synckey/tensorflow_lstm_ctc_ocr/blob/master/lstm_and_ctc_ocr_train.py#L39
thanks for your work, i can't understand the function spare_tuple_from.
in utils.py line 67 shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1]+1], dtype=np.int64)
what does the โ+1โ means?
I want to train the model on a small timit dataset, how do I change the inputs for the same?
Also, I want the output to be text, rather than numbers.
Hi Igor .. thanks so much for the example but the multi layer LSTM layer gives an error about input sizes mismatching during MATMUL ..its got a solution here ..tensorflow/tensorflow#14897
..kindly update your code
Hi, igormq. It is very helpful to see your Blog talk about CTC on Tensorflow . Thank you a million. But I have some confusion about the CTC module.
1. If sequence is A B B * B * B( * is blank). tf.ctc.ctc_greedy_decoder() should return ABBB. But Doc. say result is A B if merge_repeated =True.
2 . My code is using LSTM to classify Mnist data . Just one layer and 28 timeSteps . But CTC_LOSS don't work at all. Can you help me define the right call style? The code is so simple and I promise U can get it when you see the code .
Thanks again.
Hi,
I'm currently trying to train the RNN with multiple test inputs, in what format will the 'train_targets' and 'train_inputs' variables have to be to successfully train the network? I thought of concatenating all the inputs but as the train_seq_len and number of targets per input varies I cannot concatenate the entire database.
Also if I increase the number of layers to more than 1 the training cost does not reduce sufficiently (still using only one example), is this because there is not enough data? If that is the case then shouldn't the RNN resort to over-fitting but still predict the correct output?
Regards,
Deepak
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.