irmana / pyhtk Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 592 KB

Automatically exported from code.google.com/p/pyhtk

Python 100.00%

pyhtk's People

Watchers

pyhtk's Issues

fix cmu dict

Add code to modify a cmu dict to work with HTK; once this is done, remove the 
dict provided in Common/

Original issue reported on code.google.com by [email protected] on 6 Oct 2011 at 7:04

Add improved training recipe

I believe that standard AM training as it's done at Cambridge within the last 6 
years goes something like this:

Flat start monos
Mix-up monos
Mixdown monos
Estimate untied xwrd
Tie xwrd
Mix-up xwrd
Re-estimate xwrd using two-model re-estimation with the mixdown monos & last 
xwrd
Re-align to get interword sil & alternate prons  Re-estimate xwrd using 
two-model re-estimation with the mixdown monos and last xwrd
(repeat a bunch of times)

To train the monophones you use the same recipe as before to do the flat start.

From the flat start you mix-up to 16 components using the same schedule as 
before(2 4 6 8 10 12 14 16), and just as before you ask fo twice as many sil 
comps as non-sil, eg, at 8

MU 16 {(sil,sp).state[2-4].mix}
MU 8 {*.state[2-4].mix}

Amusingly this works, because first you mix the sils up to 16 and next you go 
over all the models asking for 8.  Since sil has 8 it doesn't do anything.

To train the xwrd models we need to mix the non-sil models down to one comp and 
sil down to 12.

The hed command is (mixdown.hed)

MD 12 {(sil,sp).state[2-4].mix}
MD 1 {(ah, ax, all the rest of the non-sil phones).state[2-4].mix}

Maybe its better to do it line by line for each non sil command

MD 12 {(sil,sp).state[2-4].mix}
MD 1 {ah.state[2-4].mix}
MD 1 {ax.state[2-4].mix}
...

Either way, you run

HHEd -A -D -T 1 -H mono_mix/MMF  -w mono_seed/MMF mixdown.hed mono.list


Now that you have the these models you proceed exactly as before, with triphone 
cloning, untied estimation, tying, etc.

Another good thing to do after state tying and the you have better models to 
re-estimate the variance floor. So in the first mixup you re-estimate the 
variance using this mixup.2.hed:

LS final-unimodal-tied-triphone-dir/stats
FA 0.1
MU 4 {(sil,sp).state[2-4].mix}
MU 2 {*.state[2-4].mix}

Just to be clear: you only do this once, in the first mixup for the tied 
triphones.  This will make the varfloor bigger than the initial estimate using 
the total variance of the data, i.e. conservative, which is a good thing.

Restimating the xwrd models from seed monos and extant xwrd models:
You use the mixdown monos in the same way as before (don't need to redo), clone 
as before (maybe recycle these).  But in the first pass of estimation for the 
untied triphone models you use two model re-estimation.  the  HERest command 
looks exactly the same as it did before, but you add:
ALIGNMODELMMF = previous 16 comp xwrd/MMF
ALIGNHMMLIST  = previous xwrd model list

This uses the previous big models to get the BW alignment.  You only do it in 
this one pass.  After you have this you proceed as before, including 
re-estimating the var floor in the first mix up
LS final-unimodal-tied-triphone-dir/stats
FA 0.1
MU 4 {(sil,sp).state[2-4].mix}
MU 2 {*.state[2-4].mix}

I'm pretty certain that Cambrideg only does one pass of BW on the untied xword 
models.  I'm not certain what I told you before...

Original issue reported on code.google.com by [email protected] on 11 Oct 2011 at 1:04

Add better logging

Need better information in the logs:
- use alignment to get amount of time in each phone
- diagnostics for disc. training
- config options to keep more intermediate files

Original issue reported on code.google.com by [email protected] on 11 Oct 2011 at 5:38

Config files

- Make config setup more modular (especially discriminative training)

Original issue reported on code.google.com by [email protected] on 6 Oct 2011 at 7:02

Include option to use HVite for lattices in disc. training

Here is what I think we should use for the phone marking using HVite.  The '-n 
32 -m' is main difference between the  HDecode.mod.

Note that unlike HDecode, HVite needs the training dictionary that has sp/sil 
on each entry, but you also need to add entries for <s> and </s>, which I did 
in the dict in the command line

HVite -A -D -V -T 9 -n 32 -m -w -q tvaldm -z lat -X lat -C 
/n/shokuji/da/swegmann/work/lats/hvite/config.hvite -H 
exp/si84/0/Xword/HMM-8-6/MMF -t 200.0 -s 15.0 -p 0.0 -S 
/n/shokuji/da/swegmann/work/lats/hvite/mfc.list -l 
/n/shokuji/da/swegmann/work/lats/hvite/den -L 
exp/si84/0/MMI/Denom/Lat_prune/404/ /n/shokuji/da/swegmann/work/lats/hvite/dict 
exp/si84/0/tied.list

I don't think that this is super expensive to run either, since most of the 
compute is in the actual recog which HDecode still does.  Maybe you could just 
link in the word/pruned lats and then just run the phone marking?  It should be 
super fast...

Original issue reported on code.google.com by [email protected] on 11 Oct 2011 at 1:05

On-the-fly HTK configs

Currently, we have a number of specific configs in the Common/ directory; these 
can be created on the fly according to options specified in the master config

Original issue reported on code.google.com by [email protected] on 6 Oct 2011 at 7:03

Add diagonalization transform code

Section 3.7 of the HTK book has a pretty good explanation of how to do this.  
The seed models for the process are fully trained mixture models.

First you need to create a "base class", this says which components are 
associated with which transforms.  To start, we'll use just one tansformation, 
so this file is trivial, call it "global" and maybe put in a directory called 
misc/trans.  The only thing dangerous about this is the 32 comp max.  Should 
prabably set this a build time to be the max ncomps in the mixture models:

~b "global"
<MMFIDMASK> *
<PARAMETERS> MIXBASE
<NUMCLASSES> 1
<CLASS> 1 {*.state[2-4].mix[1-32]}

To estimate a diagonalizing transform (semi-tied covariance in HTK parlance) 
you use HERest almost exactly as you would for one pass of BW.  To the usual BW 
config file add the lines

HADAPT:TRANSKIND = SEMIT
HADAPT:USEBIAS = FALSE
HADAPT:BASECLASS = global
HADAPT:SPLITTHRESH = 0.0
HADAPT:MAXXFORMITER = 100
HADAPT:MAXSEMITIEDITER = 20
HADAPT:TRACE = 61
HMODEL:TRACE = 512
HADAPT: SEMITIED2INPUTXFORM = TRUE

In both the scatter and gather steps add

-J misc/trans -u stw

to the usual HERest commands (I'm assuming we don't set -u normally, if we do 
change it to this).  The resulting MMF has the transformation saved within it.  
After this, you run 6 passes of BW using the usual commands and config.  Not 
certain what the directory structure should look like?  Maybe have a Diag dir 
parallel to the Mono, Xword, etc dirs and inside that the diag estimation takes 
place in HMM-16-0, and the BW passes in HMM-16-1, ...?  Recognition with 
HDecode works exactly as before too.

Original issue reported on code.google.com by [email protected] on 11 Oct 2011 at 1:03

irmana / pyhtk Goto Github PK

pyhtk's People

Watchers

pyhtk's Issues

fix cmu dict

Add improved training recipe

Add better logging

Config files

Include option to use HVite for lattices in disc. training

On-the-fly HTK configs

Add diagonalization transform code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent