maum-ai / cotatron Goto Github PK
View Code? Open in Web Editor NEWOfficial code for Cotatron @ INTERSPEECH 2020
Home Page: https://mindslab-ai.github.io/cotatron
License: BSD 3-Clause "New" or "Revised" License
Official code for Cotatron @ INTERSPEECH 2020
Home Page: https://mindslab-ai.github.io/cotatron
License: BSD 3-Clause "New" or "Revised" License
Hello,
I tarin 5 speakers, then I need add speaker number 6, but can not use the pre-trained network trained on 5 speaker because get error number of embedding 6 but expecting 5.
How can fix this issue? dont want to train from scratch take long time.
Thanks you!
I make install:
pip install -r requirements
Then run training and get:
Traceback (most recent call last):
File "cotatron_trainer.py", line 75, in <module>
main(args)
File "cotatron_trainer.py", line 54, in main
trainer.fit(model)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit
self.single_gpu_train(model)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 470, in single_gpu_train
self.run_pretrain_routine(model)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 748, in run_pretrain_routine
self.logger.log_hyperparams(ref_model.hparams)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 18, in wrapped_fn
fn(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 113, in log_hyperparams
exp, ssi, sei = hparams(params, {})
File "/opt/conda/lib/python3.7/site-packages/torch/utils/tensorboard/summary.py", line 156, in hparams
raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')
ValueError: value should be one of int, float, str, bool, or torch.Tensor
Singe GPU or multi same error.
tried update tensorboard no make difference.
I'll make a comment here when the pre-trained weights for Cotatron & MelGAN are made available.
If you wish to get notified about that, click the "Subscribe" button on the right side of this issue.
Dear team,
Thank you so much for this wonderful library.
I am creating a DB in Urdu. Can you please indicate:
Dear Team,
Thank you very much for your work on this project.
I am experimenting with your codebase and trying to synthesize the converted speech with a MelGan vocoder which I trained from scratch on a custom dataset. I am using the official implementation, which is available here:
https://github.com/descriptinc/melgan-neurips
The model is trained with the default parameters, except for the Mel frequencies, which I've set to mel_fmin=70, mel_fmax=8000, as described in the Cotatron paper.
Can you please confirm that this is the same MelGAN configuration, which you have used for training MelGAN on LibriTTS+VCTK?
If not, can you kindly describe the differences or point me to the correct GIT repo?
Many thanks.
I mean pre processing the mels.
jik876/hifi-gan#61
I am having issues with the cotatron training; in particular with the pytorch-lightning package. I've found that the code only runs with pytorch-lightning<=0.7.3, yet it is unable to identify my GPU:
File "cotatron_trainer.py", line 72, in <module>
main(args)
File "cotatron_trainer.py", line 36, in main
trainer = Trainer(
File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 389, in __init__
self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 629, in parse_gpu_ids
gpus = sanitize_gpu_ids(gpus)
File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 596, in sanitize_gpu_ids
raise MisconfigurationException(f"""
pytorch_lightning.utilities.exceptions.MisconfigurationException:
You requested GPUs: [1]
But your machine only has: []
I have the same result when setting the " -g" argument to "0".
My environment packages:
---------------------- -------------
absl-py 0.9.0
attrs 19.3.0
audioread 2.1.8
Automat 0.8.0
blinker 1.4
cachetools 4.1.0
certifi 2019.11.28
cffi 1.14.0
chardet 3.0.4
Click 7.0
cloud-init 20.1
colorama 0.4.3
command-not-found 0.3
configobj 5.0.6
constantly 15.1.0
cryptography 2.8
cycler 0.10.0
dbus-python 1.2.16
decorator 4.4.2
distro 1.4.0
distro-info 0.23ubuntu1
entrypoints 0.3
future 0.18.2
google-auth 1.16.0
google-auth-oauthlib 0.4.1
grpcio 1.29.0
httplib2 0.14.0
hyperlink 19.0.0
idna 2.8
imageio 2.8.0
importlib-metadata 1.5.0
incremental 16.10.1
inflect 4.1.0
Jinja2 2.10.1
joblib 0.15.1
jsonpatch 1.22
jsonpointer 2.0
jsonschema 3.2.0
keyring 18.0.1
kiwisolver 1.2.0
language-selector 0.1
launchpadlib 1.10.13
lazr.restfulclient 0.14.2
lazr.uri 1.0.3
librosa 0.7.2
llvmlite 0.32.1
Markdown 3.2.2
MarkupSafe 1.1.0
matplotlib 3.2.1
more-itertools 4.2.0
netifaces 0.10.4
numba 0.49.1
numpy 1.18.5
oauthlib 3.1.0
omegaconf 2.0.0
pandas 1.0.4
Pillow 7.1.2
pip 20.0.2
protobuf 3.12.2
pyasn1 0.4.2
pyasn1-modules 0.2.1
pycparser 2.20
PyGObject 3.36.0
PyHamcrest 1.9.0
PyJWT 1.7.1
pymacaroons 0.13.0
PyNaCl 1.3.0
pyOpenSSL 19.0.0
pyparsing 2.4.7
pyrsistent 0.15.5
pyserial 3.4
python-apt 2.0.0
python-dateutil 2.8.1
python-debian 0.1.36ubuntu1
pytorch-lightning 0.7.3
pytz 2020.1
PyYAML 5.3.1
requests 2.22.0
requests-oauthlib 1.3.0
requests-unixsocket 0.2.0
resampy 0.2.2
rsa 4.0
scikit-learn 0.23.1
scipy 1.4.1
SecretStorage 2.3.1
service-identity 18.1.0
setuptools 45.2.0
simplejson 3.16.0
six 1.14.0
SoundFile 0.10.3.post1
ssh-import-id 5.10
systemd-python 234
tensorboard 2.2.2
tensorboard-plugin-wit 1.6.0.post3
test-tube 0.7.5
threadpoolctl 2.1.0
torch 1.4.0
torchvision 0.5.0
tqdm 4.46.1
Twisted 18.9.0
typing-extensions 3.7.4.2
ubuntu-advantage-tools 20.3
ufw 0.36
unattended-upgrades 0.1
Unidecode 1.1.1
urllib3 1.25.8
wadllib 1.3.3
Werkzeug 1.0.1
wheel 0.34.2
zipp 1.0.0
zope.interface 4.7.1
I am running the code on a GTX2080ti and have tried updating pytorch with different CUDA binaries but without luck.
Does the number of speakers seem to affect the quality?
Is this work on the one-shot voice conversion environment?
Thank you for sharing this git.
I'm just wondering if the pre-trained model has yet to be uploaded.
Will it be uploaded soon?
Thanks for sharing the pretrained model - but I can't get it working for training Cotatron.
When I try simply resuming Cotatron training with the full model using the following command:
python cotatron_trainer.py -c config/global/config.yaml config/cota/config.yaml \
-g 0 -n my_runname -p pretrained_decoder_libritts_vctk_epoch652_15388cc.ckpt
I get mismatched key errors. This is kinda expected, as cotatron is instantiated within the Synthesizer
Missing key(s) in state_dict: "encoder.embedding.weight",
Unexpected key(s) in state_dict: "cotatron.encoder.embedding.weight
I can workaround this by using torch load/save to write out only the Cotatron part of the model...
checkpoint = torch.load('original.ckpt', map_location='cpu')
model = Synthesizer(hparams).cuda()
model.load_state_dict(checkpoint['state_dict'])
model.eval()
torch.save({
'state_dict': model.cotatron.state_dict(),
}, 'cotatron.ckpt')
But when I try training with this cotatron-specific checkpoint (i.e. 'cotatron.ckpt') pytorch-lightning complains with this error...
KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to "ModelCheckpoint.save_weights_only" being set to
True.'
I've tried various other options for extracting and reusing Cotatron-specific parameters from the checkpoint, but it seems that pytorch-lightning doesn't support this.
Were you able to get this to work? If so, could you describe how it's done.
Thanks!
Hi, guys.
Thank you so much about sharing this code. And, I think I found a minor bug, so I am reporting it.
I think the prenet depth parameter
must be hp.depth.prenet
, not hp.depth.encoder
, is it right?
Please check it.
Thanks,
Heejo
KSS 데이터셋으로 cotatron 학습을 하고 있는데 15kstep을 돌렸지만 alignment가 거의 잡히지 않습니다..
공개된 git code 만으로 KSS 데이터셋 학습이 가능한가요?
KSS 데이터셋 학습을 위한 recipe를 공개해주실 수 있으신가요?
I'm training cotatron with KSS dataset, and I've trained 15k steps, but I can barely get an alignment.
Is it possible to learn KSS dataset only with the released git code?
Could you share the recipe for learning KSS dataset?
Hello,
Want use WeveGlow since MelGan have a lot of sound metalic. I see config:
audio: # WARNING! This cannot be changed unlees you're planning to train the MelGAN vocoder by yourself.
n_mel_channels: 80
filter_length: 1024
hop_length: 256
win_length: 1024
sampling_rate: 22050
mel_fmin: 70.0
mel_fmax: 8000.0
What need change to work with pre-trained WavGlow? I try use but I think have problem with MEL normalization since sound very noisy.
I know WavGlow use mel_fmin: 0.0, I modify and retrain but still not work.
Thanks you
When I run this cell from the colab,
with torch.no_grad():
mel_s_t, alignment, residual = model.inference(text_norm, mel_source, target_speaker)
this error occurs
ConfigAttributeError Traceback (most recent call last)
in ()
1 with torch.no_grad():
----> 2 mel_s_t, alignment, residual = model.inference(text_norm, mel_source, target_speaker)
10 frames
/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py in _get_node(self, key, validate_access, throw_on_missing_value, throw_on_missing_key)
468 if value is None:
469 if throw_on_missing_key:
--> 470 raise ConfigKeyError(f"Missing key {key}")
471 elif throw_on_missing_value and value._is_missing():
472 raise MissingMandatoryValue("Missing mandatory value: $KEY")
ConfigAttributeError: Missing key mask_padding
full_key: train.mask_padding
object_type=dict
Can youlet me know how to fix it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.