Comments (12)
There should be a convention for the order of adding suffices to feature, such as sp, hires, nodup(in swbd case), etc. Currently nnet2/nnet3 recipes seem to be using inconsistent order, causing inusability.
from kaldi.
@vijayaditya I don't think I'm on the same boat with you with making a backup of the extractor.
If this is a problem, then the recipe should be modified (to prevent or warn the user). Making a backup of things is not what is usually done in kaldi (with the exception of the fix_data_dir and even there it's more or less pointless because if you run the script multiple times (before and after feature extraction), you will end up with a backup that will probably have the same content.
from kaldi.
@jtrmal the backup in the case of extractor is required as you are not guaranteed to select the same gaussians indices from the UBM. Any kind of overwrite would make all the models trained with the previous extractor completely useless. There will be a warning that there is an overwrite and the old model can be found at the specific directory. This backup is small price to pay to ensure that the old models are useful.
I don't think making the script interactive, by soliciting user input, is an option.
from kaldi.
I agree that this isn't a pattern we should use excessively, but I have
been bitten recently by this and it's kind of painful to lose the iVector
extractor when you just trained a huge nnet model that requires it. How
about just storing the .ie file itself as a backup, but not the other
files? [perhaps that's what you're doing.]
Dan
On Mon, Oct 26, 2015 at 11:03 AM, jtrmal [email protected] wrote:
@vijayaditya https://github.com/vijayaditya I don't think I'm on the
same boat with you with making a backup of the extractor.
If this is a problem, then the recipe should be modified (to prevent or
warn the user). Making a backup of things is not what is usually done in
kaldi (with the exception of the fix_data_dir and even there it's more or
less pointless because if you run the script multiple times (before and
after feature extraction), you will end up with a backup that will probably
have the same content.—
Reply to this email directly or view it on GitHub
#295 (comment).
from kaldi.
@danpovey OK, I will just keep the .ie file.
from kaldi.
I'm not suggesting the scripts should be interactive. I don't think backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.
Alternative proposition might be to keep the *.ie.gz with the trained nnets
-- similarly as we keep mdl files with the alignments.
y.
On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:
@jtrmal https://github.com/jtrmal the backup in the case of extractor
is required as you are not guaranteed to select the same gaussians indices
from the UBM. Any kind of overwrite would make all the models trained with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old models
are useful.I don't think making the script interactive, by soliciting user input, is
an option.—
Reply to this email directly or view it on GitHub
#295 (comment).
from kaldi.
I considered backing up the .ie files in the nnet directory, but it would
be hard to implement because the directory where the .ie file is located is
not available to the nnet training scripts-- unless we also change the
iVector extraction scripts to copy the .ie file to where it dumps the
iVectors. That would certainly be a possibility-- it would make it easier
to automatically check for mismatches in decoding etc.
Dan
On Mon, Oct 26, 2015 at 1:33 PM, jtrmal [email protected] wrote:
I'm not suggesting the scripts should be interactive. I don't think backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.Alternative proposition might be to keep the *.ie.gz with the trained nnets
-- similarly as we keep mdl files with the alignments.
y.On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:@jtrmal https://github.com/jtrmal the backup in the case of extractor
is required as you are not guaranteed to select the same gaussians
indices
from the UBM. Any kind of overwrite would make all the models trained
with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old
models
are useful.I don't think making the script interactive, by soliciting user input, is
an option.—
Reply to this email directly or view it on GitHub
#295 (comment).—
Reply to this email directly or view it on GitHub
#295 (comment).
from kaldi.
Yes, storing it with the ivecs makes sense as well.
I think this would be much more robust mechanism than the backups.
y.
On Mon, Oct 26, 2015 at 1:47 PM, Daniel Povey [email protected]
wrote:
I considered backing up the .ie files in the nnet directory, but it would
be hard to implement because the directory where the .ie file is located is
not available to the nnet training scripts-- unless we also change the
iVector extraction scripts to copy the .ie file to where it dumps the
iVectors. That would certainly be a possibility-- it would make it easier
to automatically check for mismatches in decoding etc.
DanOn Mon, Oct 26, 2015 at 1:33 PM, jtrmal [email protected] wrote:
I'm not suggesting the scripts should be interactive. I don't think
backup
will make things clearer -- especially when you will have to be able to
store backups of backups and so on.Alternative proposition might be to keep the *.ie.gz with the trained
nnets
-- similarly as we keep mdl files with the alignments.
y.On Mon, Oct 26, 2015 at 1:26 PM, Vijayaditya Peddinti <
[email protected]> wrote:@jtrmal https://github.com/jtrmal the backup in the case of
extractor
is required as you are not guaranteed to select the same gaussians
indices
from the UBM. Any kind of overwrite would make all the models trained
with
the previous extractor completely useless. There will be a warning that
there is an overwrite and the old model can be found at the specific
directory. This backup is small price to pay to ensure that the old
models
are useful.I don't think making the script interactive, by soliciting user input,
is
an option.—
Reply to this email directly or view it on GitHub
<#295 (comment)
.—
Reply to this email directly or view it on GitHub
#295 (comment).—
Reply to this email directly or view it on GitHub
#295 (comment).
from kaldi.
@pegahgh As you are facing similar issues could you please this forward.
from kaldi.
@vijayaditya, what is going on with this?
BTW, I think the iVector extractor is quite large, it would be good to avoid making too many unnecessary copies. But I'm not ruling it out.
from kaldi.
@danpovey I will address the overwrite issue by checking for existing extractor in the ivector training script ( in #967 ). I would prefer maintaining a copy with the nnet3 model as large scale recipes take around 10-20 days to train and simple user errors can turn out to be very costly.
from kaldi.
@danpovey, I think this issue can be closed.
The UBM/i-vector extractor backup was added back in #1514 .
from kaldi.
Related Issues (20)
- Dockerのイメージ作成時にGPGエラーになる HOT 3
- Intel MKL installation Fail HOT 1
- The error causes when using the command `./install_mkl.sh` to install mkl tools HOT 1
- When the following situations occur, the memory will increase HOT 3
- Request for prebuilt Android APK demonstrating capabilities HOT 2
- Is that OK to use Openfst 1.8.1 in Kaldi's latest version? HOT 3
- Chain alignments used for training HOT 1
- Error when build docker GPU image ubuntu22.04-cuda12.2.0
- ReadDecodeGraph():onlinebin-util.cc:52) Error reading FST (after reading header).
- kaldi uses the C++17 feature while compiling in the C++14 mode
- Please tag the repository with the version
- AMI download manifest and license, files do not exist HOT 2
- Error while building docker image: install_mkl.sh
- Seeking Guidance on Custom Urdu ASR Training Data and Vocabulary Expansion
- KaldiFatalError
- NO_PUBKEY on intell error HOT 1
- The `Resize` function doesn't change `stride_` if resize_type == kSetZero and `rows == MatrixBase<Real>::num_rows_ && cols == MatrixBase<Real>::num_cols_`.
- Integrate ZLUDA for AMD CUDA HOT 1
- issue with install_klm.sh HOT 2
- utils/validate_data_dir.sh: Error: in data/train_shorter, recording-ids extracted from wav.scp and reco2dur file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaldi.