In the current setup if a user re-runs the ivector part of the recipe the extractor/di

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I'm not suggesting the s should be interactive. I don't think backup will m

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Irrecoverable ivector extractor/diag-ubm overwrites in nnet2/nnet3 ivector recipes. about kaldi HOT 12 CLOSED

kaldi-asr commented on May 12, 2024

Irrecoverable ivector extractor/diag-ubm overwrites in nnet2/nnet3 ivector recipes.

from kaldi.

Comments (12)

naxingyu commented on May 12, 2024

There should be a convention for the order of adding suffices to feature, such as sp, hires, nodup(in swbd case), etc. Currently nnet2/nnet3 recipes seem to be using inconsistent order, causing inusability.

from kaldi.

jtrmal commented on May 12, 2024

@vijayaditya I don't think I'm on the same boat with you with making a backup of the extractor.
If this is a problem, then the recipe should be modified (to prevent or warn the user). Making a backup of things is not what is usually done in kaldi (with the exception of the fix_data_dir and even there it's more or less pointless because if you run the script multiple times (before and after feature extraction), you will end up with a backup that will probably have the same content.

from kaldi.

vijayaditya commented on May 12, 2024

@jtrmal the backup in the case of extractor is required as you are not guaranteed to select the same gaussians indices from the UBM. Any kind of overwrite would make all the models trained with the previous extractor completely useless. There will be a warning that there is an overwrite and the old model can be found at the specific directory. This backup is small price to pay to ensure that the old models are useful.

I don't think making the script interactive, by soliciting user input, is an option.

from kaldi.

danpovey commented on May 12, 2024

I agree that this isn't a pattern we should use excessively, but I have
been bitten recently by this and it's kind of painful to lose the iVector
extractor when you just trained a huge nnet model that requires it. How
about just storing the .ie file itself as a backup, but not the other
files? [perhaps that's what you're doing.]
Dan

On Mon, Oct 26, 2015 at 11:03 AM, jtrmal [email protected] wrote:

@vijayaditya https://github.com/vijayaditya I don't think I'm on the
same boat with you with making a backup of the extractor.
If this is a problem, then the recipe should be modified (to prevent or
warn the user). Making a backup of things is not what is usually done in
kaldi (with the exception of the fix_data_dir and even there it's more or
less pointless because if you run the script multiple times (before and
after feature extraction), you will end up with a backup that will probably
have the same content.

—
Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

vijayaditya commented on May 12, 2024

@danpovey OK, I will just keep the .ie file.

from kaldi.

jtrmal commented on May 12, 2024

I'm not suggesting the scripts should be interactive. I don't think backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.

Alternative proposition might be to keep the *.ie.gz with the trained nnets
-- similarly as we keep mdl files with the alignments.
y.

On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@jtrmal https://github.com/jtrmal the backup in the case of extractor
is required as you are not guaranteed to select the same gaussians indices
from the UBM. Any kind of overwrite would make all the models trained with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old models
are useful.

I don't think making the script interactive, by soliciting user input, is
an option.

—
Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

danpovey commented on May 12, 2024

I considered backing up the .ie files in the nnet directory, but it would
be hard to implement because the directory where the .ie file is located is
not available to the nnet training scripts-- unless we also change the
iVector extraction scripts to copy the .ie file to where it dumps the
iVectors. That would certainly be a possibility-- it would make it easier
to automatically check for mismatches in decoding etc.
Dan

On Mon, Oct 26, 2015 at 1:33 PM, jtrmal [email protected] wrote:

I'm not suggesting the scripts should be interactive. I don't think backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.

Alternative proposition might be to keep the *.ie.gz with the trained nnets
-- similarly as we keep mdl files with the alignments.
y.

On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@jtrmal https://github.com/jtrmal the backup in the case of extractor
is required as you are not guaranteed to select the same gaussians
indices
from the UBM. Any kind of overwrite would make all the models trained
with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old
models
are useful.

I don't think making the script interactive, by soliciting user input, is
an option.

—
Reply to this email directly or view it on GitHub
#295 (comment).

—
Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

jtrmal commented on May 12, 2024

Yes, storing it with the ivecs makes sense as well.
I think this would be much more robust mechanism than the backups.
y.

On Mon, Oct 26, 2015 at 1:47 PM, Daniel Povey [email protected]
wrote:

I considered backing up the .ie files in the nnet directory, but it would
be hard to implement because the directory where the .ie file is located is
not available to the nnet training scripts-- unless we also change the
iVector extraction scripts to copy the .ie file to where it dumps the
iVectors. That would certainly be a possibility-- it would make it easier
to automatically check for mismatches in decoding etc.
Dan

On Mon, Oct 26, 2015 at 1:33 PM, jtrmal [email protected] wrote:

I'm not suggesting the scripts should be interactive. I don't think
backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.

Alternative proposition might be to keep the *.ie.gz with the trained
nnets
-- similarly as we keep mdl files with the alignments.
y.

On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@jtrmal https://github.com/jtrmal the backup in the case of
extractor
is required as you are not guaranteed to select the same gaussians
indices
from the UBM. Any kind of overwrite would make all the models trained
with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old
models
are useful.

I don't think making the script interactive, by soliciting user input,
is
an option.

—
Reply to this email directly or view it on GitHub
<#295 (comment)
.

—
Reply to this email directly or view it on GitHub
#295 (comment).

—
Reply to this email directly or view it on GitHub
#295 (comment).

from kaldi.

vijayaditya commented on May 12, 2024

@pegahgh As you are facing similar issues could you please this forward.

from kaldi.

danpovey commented on May 12, 2024

@vijayaditya, what is going on with this?
BTW, I think the iVector extractor is quite large, it would be good to avoid making too many unnecessary copies. But I'm not ruling it out.

from kaldi.

vijayaditya commented on May 12, 2024

@danpovey I will address the overwrite issue by checking for existing extractor in the ivector training script ( in #967 ). I would prefer maintaining a copy with the nnet3 model as large scale recipes take around 10-20 days to train and simple user errors can turn out to be very costly.

from kaldi.

david-ryan-snyder commented on May 12, 2024

@danpovey, I think this issue can be closed.

The UBM/i-vector extractor backup was added back in #1514 .

from kaldi.

Irrecoverable ivector extractor/diag-ubm overwrites in nnet2/nnet3 ivector recipes. about kaldi HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent