Giter Site home page Giter Site logo

Comments (22)

dspavankumar avatar dspavankumar commented on August 17, 2024 1

OK, if the CNN discriminates only at the phoneme level and not at the state level, then you couldn't have the usual 3-state HMMs because you won't have the posteriors at state level. You will only have one state HMM per phoneme, which will limit the very advantages of using an HMM system. If you still want to go ahead, you can change the topology file data/lang/topo before initialising the GMM.

Regarding decoding, we can use latgen-faster-mapped in Kaldi, that directly reads posteriors, bypassing GMM evaluation, to generate the lattices. In fact, this is the standard method used everywhere (Kaldi nnet setups and also in this repository). Transition model and tree are always copied from the GMM system. A trained transition model gives a very small improvement over one with equiprobable transitions (that's why I suggested one iteration of gmm-est if you wanted to build a model from scratch). Tree in context dependent systems is hierarchically built; whereas in context independent systems, it is a dummy tree of height=1 that only asks question about the centre phone at the root node.

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024 1

Yep I Made an error it works now.. Thanks for going beyond I would hope getting out of this.
Thanks for the help 👍

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

Is it possible to train the acoustic model to given a pretrained model.

I don't understand your question clearly. Do you mean to train an HMM (its transition model) after training the CNN, so that you can build a hybrid HMM-CNN system and test it? If yes, read the next para. If you just want to use the trained CNN as an initialisation to train another CNN, just use keras.models.load_model and train it.

If you want to train an HMM, this is possible if your CNN was trained to predict phone states instead of just phones. You could start with gmm-init-mono following steps/train_mono.sh in the standard Kaldi setup until one iteration of gmm-est that creates 1.mdl. This will also train a GMM, but you won't use it while decoding. Also, you need to make sure that the PDF indices gmm-init-mono creates (in its output file mono.tree) match those of the alignments you used for building the CNN. To decode, you could use a decoding scripts from steps_kt.

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

I want to end up with a CNN-HMM, In which the CNN provides the posterior probability of what that given center frame could probably be amongst all possible phonemes, Which the CNN currently does.

Given the solution you mention, would the HMM still not use the GMM probabilities rather than the CNN probabilities ?

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

Yes.. I guess I could have been clear about that.. Force allignment is performed on a monophone model, but could also have been done on a tri-phone model.

I tried gmm-init-mono but since I am parsing an image is my dimension pretty large.

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

If you did forced alignment on a monophone model, you must already have them at their state level (provided you used Kaldi).

So, does gmm-init-mono throw an out-of-memory sort of exception?

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

No.... just stating that the dim is too large.

gmm-init-mono topo 11880 mono.mdl mono.tree
gmm-init-mono topo 11880 mono.mdl mono.tree 
ASSERTION_FAILED (gmm-init-mono:main():gmm-init-mono.cc:85) : 'dim> 0 && dim < 10000' 

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
main
__libc_start_main
gmm-init-mono() [0x4176f9]

Aborted (core dumped)

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

What if you just removed that assertion, recompiled Kaldi and tried again?

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

Removing the assertion, and recompile created it correctly.

Now I have the model..

How should i format the probabilities, should the probabilities be monophone states, or should I feed both all the 11880 pixels as observation, and the probabilities of all the monophone states, using latgen-faster-mapped

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

I tried this...

carl@carl-ThinkPad-T420s:~/kaldi-trunk/egs/timit/s5$ latgen-faster-mapped --word-symbol-table=/home/carl/kaldi-trunk/egs/timit/s5/data/lang/words.txt mono.mdl /home/carl/kaldi-trunk/egs/timit/s5/exp/mono/graph/HCLG.fst ark:python_features/org_train_total_frames_15_dim_40_winheig_8_batch_1_fws_input.ark  ark,t:output.ark
latgen-faster-mapped --word-symbol-table=/home/carl/kaldi-trunk/egs/timit/s5/data/lang/words.txt mono.mdl /home/carl/kaldi-trunk/egs/timit/s5/exp/mono/graph/HCLG.fst ark:python_features/org_train_total_frames_15_dim_40_winheig_8_batch_1_fws_input.ark ark,t:output.ark 
ERROR (latgen-faster-mapped:DecodableMatrixScaledMapped():decoder/decodable-matrix.h:42) DecodableMatrixScaledMapped: mismatch, matrix has 11880 rows but transition-model has 144 pdf-ids.

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::DecodableMatrixScaledMapped::DecodableMatrixScaledMapped(kaldi::TransitionModel const&, kaldi::Matrix<float> const&, float)
main
__libc_start_main
latgen-faster-mapped() [0x444ca9]

I've stored the features i used for training my model in keras into a 1d ark file named
org_train_total_frames_15_dim_40_winheig_8_batch_1_fws_input.ark...

but It don't seem like there is "any room" to parse the actual probabilities?

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

This doesn't seem like a problem with the dimension. Could you check if you are storing the features correctly (for e.g. not its transpose)?

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

I've flatten the image, and stored each pixel values which is in total is 11880 pixels..

feat-to-dim ark:python_features/org_train_total_frames_15_dim_40_winheig_8_batch_1_fws_input.ark - 
11880
feat-to-len ark:python_features/org_train_total_frames_15_dim_40_winheig_8_batch_1_fws_input.ark
73247

so it should be ok?

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

OK, I think there is another method. Can you try creating a model with gmm-init-mono with a small dimension (for e.g. 39) and then only copy the transition model using copy-transition-model? I don't know if these transitions can be trained, but you could use it for decoding with latgen-faster-mapped.

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

Still the same output as before with mixmatch dimensions..

I still find it weird that just by feeding the pixels, it would generate lattices without knowing any monophone state probability...

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

The CNN computes the probability, right? Only it is in the form of posterior. This is discriminative modelling, where only one model tries to discriminate between all the classes. Whereas in generative modelling, there is a separate model per each class that gives likelihood. We can think of the output of the discriminator at a particular output node (when normalised by the class prior) as a pseudo likelihood that a generative model (if existed) would have computed.

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

Ok.. I think i found the root of my problems..

 latgen-faster-mapped  exp/mono_cnn/final.mdl exp/mono_cnn/graph/HCLG.fst ark:python_features/train.ark  ark,t:output.txt
latgen-faster-mapped exp/mono_cnn/final.mdl exp/mono_cnn/graph/HCLG.fst ark:python_features/train.ark ark,t:output.txt 
ERROR (latgen-faster-mapped:DecodableMatrixScaledMapped():decoder/decodable-matrix.h:42) DecodableMatrixScaledMapped: mismatch, matrix has 5 rows but transition-model has 9 pdf-ids.

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::DecodableMatrixScaledMapped::DecodableMatrixScaledMapped(kaldi::TransitionModel const&, kaldi::Matrix<float> const&, float)
main
__libc_start_main
latgen-faster-mapped() [0x444ca9]

I guess i have to change num_of_pdf-ids but how do i do that?..
I can't seem to find where it is being defined, or how I would be able to alter it ?

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

It's in your final.mdl. The model is expecting 9 dimensional posterior. Run am-info exp/mono_cnn/final.mdl and check for number of pdfs. This could have multiple sources.

  1. Check the file data/lang/topo. How many distinct states do you see, in total for all the phones? If you see more than 5, then you must change it so that it reflects 5 states, run utils/validate_lang.pl once, and regenerate final.mdl.
  2. If topo is fine, check the tree file exp/mono_cnn/tree. Did you use context dependency? Use tree-info and check for num-pdfs, or tree-copy to convert the tree into text format if required. Again, do you see more than five leaf nodes? Check your tree creation step.
  3. As a last resort, use gmm-copy to convert the final.mdl into text format, and manually change it to your requirement. This could be tedious.

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

The error seem to be both topo and tree..

Drawing the tree shows the tree as such:
draw-tree data/lang_test_mono/phones.txt exp/mono_cnn/tree | dot -Gsize=8,10.5 -Tps | ps2pdf - tree.pdf > tree.pdf
Which seem to be caused by the topo

<Topology>
<TopologyEntry>
<ForPhones>
2 3
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
1
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 4 <PdfClass> 4 <Transition> 4 0.75 <Transition> 5 0.25 </State>
<State> 5 </State>
</TopologyEntry>
</Topology>

So a solution would be correct the topo..

So I corrrect the Topo by making a new as such:

utils/gen_topo.pl 2 1 2:3 1                             
<Topology>
<TopologyEntry>
<ForPhones>
2 3
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
1
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 2 </State>
</TopologyEntry>
</Topology>

and then try to train it again

steps/train_mono.sh data/train_yesno/ data/lang_test_mono/ exp/mono_cnn_2

which give me this error message:

# gmm-init-mono --shared-phones=data/lang_test_mono//phones/sets.int "--train-feats=ark,s,cs:apply-cmvn  --utt2spk=ark:data/train_yesno//split1/1/utt2spk scp:data/train_yesno//s$
# Started at Thu Jun  8 12:09:58 CEST 2017
#
gmm-init-mono --shared-phones=data/lang_test_mono//phones/sets.int '--train-feats=ark,s,cs:apply-cmvn  --utt2spk=ark:data/train_yesno//split1/1/utt2spk scp:data/train_yesno//spl$
subset-feats --n=10 ark:- ark:-
apply-cmvn --utt2spk=ark:data/train_yesno//split1/1/utt2spk scp:data/train_yesno//split1/1/cmvn.scp scp:data/train_yesno//split1/1/feats.scp ark:-
add-deltas ark:- ark:-
ERROR (gmm-init-mono:Read():hmm-topology.cc:76) States are expected to be in order from zero, expected 1, got 2

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::HmmTopology::Read(std::istream&, bool)
main
__libc_start_main
gmm-init-mono() [0x4176f9]

# Accounting: time=0 threads=1
# Ended (code 255) at Thu Jun  8 12:09:58 CEST 2017, elapsed time 0 seconds

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

Change the third line from the last to:
<State> 1 </State>

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

Yes... That worked - Actually it seems that the gen_topo.plscript had a minor bug which is fixed now.

Thanks a lot for the help!

latgen-faster-mapped computes and output..

Now I just need to interpret the data in a meaningful way.

the decode.sh script seem to be useful, but some some adjustments has be made so that I can use it.

Actually, the only commands I execute is lateen-faster-mapped and the scoring script.

Is possible to to somehow compute the WER given this setup?

from keras-kaldi.

dspavankumar avatar dspavankumar commented on August 17, 2024

You're welcome. The scoring script computes the WER and stores it in the decode directory.

from keras-kaldi.

Miail avatar Miail commented on August 17, 2024

Hmm.. if that the case, then something must be wrong.. I seem to get an WER of 100%

compute-wer --text --mode=present ark:exp/mono_gmm_train//scoring/test_filt.txt ark,p:- 
%WER 100.00 [ 240 / 240, 0 ins, 240 del, 0 sub ] [PARTIAL]
%SER 100.00 [ 30 / 30 ]
Scored 30 sentences, 1 not present in hyp.

from keras-kaldi.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.