Is it possible given this implementation to train an acoustic models given a different

Different form of input? about keras-kaldi HOT 3 CLOSED

dspavankumar commented on July 18, 2024

Different form of input?

from keras-kaldi.

Comments (3)

dspavankumar commented on July 18, 2024

It is possible to train acoustic models with any kind of input, if you can store the features in Kaldi format.

A CNN acoustic model trained in Keras can be used to extract posterior features by forward pass on the test features (similar to nnet-forward). For this your network should, after some convolutional layers, convert the 3D signal to 2D and use a Dense layer with softmax at the output. Then its outputs can be converted to likelihoods and can be sent to the decoder using latgen-faster-mapped.

from keras-kaldi.

yh1008 commented on July 18, 2024

Just curious, If I have fbank features extracted, and GMM force-alignment trained, and I would like to train CNN on top of it.
Can I simply replace your

m = keras.models.Sequential([
                keras.layers.LSTM(256, input_shape=(learning['spliceSize'],trGen.inputFeatDim), activation='tanh', return_sequences=True),
                keras.layers.LSTM(256, activation='tanh', return_sequences=True),
                keras.layers.LSTM(256, activation='tanh'),
                keras.layers.Dense(trGen.outputFeatDim, activation='softmax')])

(in trian*.py)
to something like the following

m = Sequential()
m.add(Convolution2D(150,8,8), input_shape=trGen.inputFeatDim)
m.add(MaxPooling2D(6,6)) 
m.add(Flatten())
m.add(Dense(1024))
m.add(Activation('relu'))
m.add(Dense(output_dim=treGen.outputFeatDim)
m.add(Activation('softmax'))

and have the rest of the files remained the same?

Is there anything else you would like us to know that requires modification for this pipeline to work? like do I need to modify dataGenerator ?

Thanks in advanced!

from keras-kaldi.

dspavankumar commented on July 18, 2024

Yes, but I guess Convolution1D makes sense in the case of filterbank features (because we want each filter of the Kernel to move across time and capture sound patterns by looking at the frequencies, and so we don't want the Kernel to move on the frequency axis). You could try that. The batch_size could be kept None, size could be your context and input_dim could be the number of filters in the filterbank. And then you can flatten the layer's output and use Dense layer(s) with a softmax at the output. You can use dataGenSequences for this purpose. I haven't tested any code though. I will try to include a CNN example in a later revision.

from keras-kaldi.

Different form of input? about keras-kaldi HOT 3 CLOSED

Comments (3)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent