Giter Site home page Giter Site logo

Comments (12)

naxingyu avatar naxingyu commented on May 12, 2024

There should be a convention for the order of adding suffices to feature, such as sp, hires, nodup(in swbd case), etc. Currently nnet2/nnet3 recipes seem to be using inconsistent order, causing inusability.

from kaldi.

jtrmal avatar jtrmal commented on May 12, 2024

@vijayaditya I don't think I'm on the same boat with you with making a backup of the extractor.
If this is a problem, then the recipe should be modified (to prevent or warn the user). Making a backup of things is not what is usually done in kaldi (with the exception of the fix_data_dir and even there it's more or less pointless because if you run the script multiple times (before and after feature extraction), you will end up with a backup that will probably have the same content.

from kaldi.

vijayaditya avatar vijayaditya commented on May 12, 2024

@jtrmal the backup in the case of extractor is required as you are not guaranteed to select the same gaussians indices from the UBM. Any kind of overwrite would make all the models trained with the previous extractor completely useless. There will be a warning that there is an overwrite and the old model can be found at the specific directory. This backup is small price to pay to ensure that the old models are useful.

I don't think making the script interactive, by soliciting user input, is an option.

from kaldi.

danpovey avatar danpovey commented on May 12, 2024

I agree that this isn't a pattern we should use excessively, but I have
been bitten recently by this and it's kind of painful to lose the iVector
extractor when you just trained a huge nnet model that requires it. How
about just storing the .ie file itself as a backup, but not the other
files? [perhaps that's what you're doing.]
Dan

On Mon, Oct 26, 2015 at 11:03 AM, jtrmal [email protected] wrote:

@vijayaditya https://github.com/vijayaditya I don't think I'm on the
same boat with you with making a backup of the extractor.
If this is a problem, then the recipe should be modified (to prevent or
warn the user). Making a backup of things is not what is usually done in
kaldi (with the exception of the fix_data_dir and even there it's more or
less pointless because if you run the script multiple times (before and
after feature extraction), you will end up with a backup that will probably
have the same content.


Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

vijayaditya avatar vijayaditya commented on May 12, 2024

@danpovey OK, I will just keep the .ie file.

from kaldi.

jtrmal avatar jtrmal commented on May 12, 2024

I'm not suggesting the scripts should be interactive. I don't think backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.

Alternative proposition might be to keep the *.ie.gz with the trained nnets
-- similarly as we keep mdl files with the alignments.
y.

On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@jtrmal https://github.com/jtrmal the backup in the case of extractor
is required as you are not guaranteed to select the same gaussians indices
from the UBM. Any kind of overwrite would make all the models trained with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old models
are useful.

I don't think making the script interactive, by soliciting user input, is
an option.


Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

danpovey avatar danpovey commented on May 12, 2024

I considered backing up the .ie files in the nnet directory, but it would
be hard to implement because the directory where the .ie file is located is
not available to the nnet training scripts-- unless we also change the
iVector extraction scripts to copy the .ie file to where it dumps the
iVectors. That would certainly be a possibility-- it would make it easier
to automatically check for mismatches in decoding etc.
Dan

On Mon, Oct 26, 2015 at 1:33 PM, jtrmal [email protected] wrote:

I'm not suggesting the scripts should be interactive. I don't think backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.

Alternative proposition might be to keep the *.ie.gz with the trained nnets
-- similarly as we keep mdl files with the alignments.
y.

On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@jtrmal https://github.com/jtrmal the backup in the case of extractor
is required as you are not guaranteed to select the same gaussians
indices
from the UBM. Any kind of overwrite would make all the models trained
with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old
models
are useful.

I don't think making the script interactive, by soliciting user input, is
an option.


Reply to this email directly or view it on GitHub
#295 (comment).


Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

jtrmal avatar jtrmal commented on May 12, 2024

Yes, storing it with the ivecs makes sense as well.
I think this would be much more robust mechanism than the backups.
y.

On Mon, Oct 26, 2015 at 1:47 PM, Daniel Povey [email protected]
wrote:

I considered backing up the .ie files in the nnet directory, but it would
be hard to implement because the directory where the .ie file is located is
not available to the nnet training scripts-- unless we also change the
iVector extraction scripts to copy the .ie file to where it dumps the
iVectors. That would certainly be a possibility-- it would make it easier
to automatically check for mismatches in decoding etc.
Dan

On Mon, Oct 26, 2015 at 1:33 PM, jtrmal [email protected] wrote:

I'm not suggesting the scripts should be interactive. I don't think
backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.

Alternative proposition might be to keep the *.ie.gz with the trained
nnets
-- similarly as we keep mdl files with the alignments.
y.

On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@jtrmal https://github.com/jtrmal the backup in the case of
extractor
is required as you are not guaranteed to select the same gaussians
indices
from the UBM. Any kind of overwrite would make all the models trained
with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old
models
are useful.

I don't think making the script interactive, by soliciting user input,
is
an option.


Reply to this email directly or view it on GitHub
<#295 (comment)
.


Reply to this email directly or view it on GitHub
#295 (comment).


Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

vijayaditya avatar vijayaditya commented on May 12, 2024

@pegahgh As you are facing similar issues could you please this forward.

from kaldi.

danpovey avatar danpovey commented on May 12, 2024

@vijayaditya, what is going on with this?
BTW, I think the iVector extractor is quite large, it would be good to avoid making too many unnecessary copies. But I'm not ruling it out.

from kaldi.

vijayaditya avatar vijayaditya commented on May 12, 2024

@danpovey I will address the overwrite issue by checking for existing extractor in the ivector training script ( in #967 ). I would prefer maintaining a copy with the nnet3 model as large scale recipes take around 10-20 days to train and simple user errors can turn out to be very costly.

from kaldi.

david-ryan-snyder avatar david-ryan-snyder commented on May 12, 2024

@danpovey, I think this issue can be closed.

The UBM/i-vector extractor backup was added back in #1514 .

from kaldi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.