maum-ai / cotatron Goto Github PK

View Code? Open in Web Editor NEW

210.0 210.0 32.0 14.65 MB

Official code for Cotatron @ INTERSPEECH 2020

Home Page: https://mindslab-ai.github.io/cotatron

License: BSD 3-Clause "New" or "Revised" License

Python 98.97% Shell 1.03%

pytorch speech-synthesis voice-conversion

cotatron's People

Contributors

Stargazers

Watchers

cotatron's Issues

re-training n+1 speakers

Hello,
I tarin 5 speakers, then I need add speaker number 6, but can not use the pre-trained network trained on 5 speaker because get error number of embedding 6 but expecting 5.

How can fix this issue? dont want to train from scratch take long time.

Is need text in training corpus or only wav files?

Thanks you!

Make train: raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')

I make install:
pip install -r requirements

Then run training and get:

Traceback (most recent call last):
  File "cotatron_trainer.py", line 75, in <module>
    main(args)
  File "cotatron_trainer.py", line 54, in main
    trainer.fit(model)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit
    self.single_gpu_train(model)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 470, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 748, in run_pretrain_routine
    self.logger.log_hyperparams(ref_model.hparams)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 18, in wrapped_fn
    fn(self, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 113, in log_hyperparams
    exp, ssi, sei = hparams(params, {})
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/tensorboard/summary.py", line 156, in hparams
    raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')
ValueError: value should be one of int, float, str, bool, or torch.Tensor

Singe GPU or multi same error.
tried update tensorboard no make difference.

Release of pre-trained weights for Cotatron & MelGAN

I'll make a comment here when the pre-trained weights for Cotatron & MelGAN are made available.
If you wish to get notified about that, click the "Subscribe" button on the right side of this issue.

Adaptation for a corpus in Urdu

Dear team,
Thank you so much for this wonderful library.
I am creating a DB in Urdu. Can you please indicate:

How many speakers do I need in total? ( I have 8, 2 hours each)
How long should the length of each speaker be in hours?
How many epochs should the cota and synthesizer be trained?

Clarification regarding the configuration of the MelGAN model

Dear Team,

Thank you very much for your work on this project.
I am experimenting with your codebase and trying to synthesize the converted speech with a MelGan vocoder which I trained from scratch on a custom dataset. I am using the official implementation, which is available here:
https://github.com/descriptinc/melgan-neurips
The model is trained with the default parameters, except for the Mel frequencies, which I've set to mel_fmin=70, mel_fmax=8000, as described in the Cotatron paper.

Can you please confirm that this is the same MelGAN configuration, which you have used for training MelGAN on LibriTTS+VCTK?
If not, can you kindly describe the differences or point me to the correct GIT repo?

Many thanks.

What change need make to compatible with Hi-fi GAN?

I mean pre processing the mels.
jik876/hifi-gan#61

pytorch_lightning.utilities.exceptions.MisconfigurationException

I am having issues with the cotatron training; in particular with the pytorch-lightning package. I've found that the code only runs with pytorch-lightning<=0.7.3, yet it is unable to identify my GPU:

  File "cotatron_trainer.py", line 72, in <module>
    main(args)
  File "cotatron_trainer.py", line 36, in main
    trainer = Trainer(
  File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 389, in __init__
    self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
  File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 629, in parse_gpu_ids
    gpus = sanitize_gpu_ids(gpus)
  File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 596, in sanitize_gpu_ids
    raise MisconfigurationException(f"""
pytorch_lightning.utilities.exceptions.MisconfigurationException:
                You requested GPUs: [1]
                But your machine only has: []

I have the same result when setting the " -g" argument to "0".

My environment packages:

---------------------- -------------
absl-py                0.9.0
attrs                  19.3.0
audioread              2.1.8
Automat                0.8.0
blinker                1.4
cachetools             4.1.0
certifi                2019.11.28
cffi                   1.14.0
chardet                3.0.4
Click                  7.0
cloud-init             20.1
colorama               0.4.3
command-not-found      0.3
configobj              5.0.6
constantly             15.1.0
cryptography           2.8
cycler                 0.10.0
dbus-python            1.2.16
decorator              4.4.2
distro                 1.4.0
distro-info            0.23ubuntu1
entrypoints            0.3
future                 0.18.2
google-auth            1.16.0
google-auth-oauthlib   0.4.1
grpcio                 1.29.0
httplib2               0.14.0
hyperlink              19.0.0
idna                   2.8
imageio                2.8.0
importlib-metadata     1.5.0
incremental            16.10.1
inflect                4.1.0
Jinja2                 2.10.1
joblib                 0.15.1
jsonpatch              1.22
jsonpointer            2.0
jsonschema             3.2.0
keyring                18.0.1
kiwisolver             1.2.0
language-selector      0.1
launchpadlib           1.10.13
lazr.restfulclient     0.14.2
lazr.uri               1.0.3
librosa                0.7.2
llvmlite               0.32.1
Markdown               3.2.2
MarkupSafe             1.1.0
matplotlib             3.2.1
more-itertools         4.2.0
netifaces              0.10.4
numba                  0.49.1
numpy                  1.18.5
oauthlib               3.1.0
omegaconf              2.0.0
pandas                 1.0.4
Pillow                 7.1.2
pip                    20.0.2
protobuf               3.12.2
pyasn1                 0.4.2
pyasn1-modules         0.2.1
pycparser              2.20
PyGObject              3.36.0
PyHamcrest             1.9.0
PyJWT                  1.7.1
pymacaroons            0.13.0
PyNaCl                 1.3.0
pyOpenSSL              19.0.0
pyparsing              2.4.7
pyrsistent             0.15.5
pyserial               3.4
python-apt             2.0.0
python-dateutil        2.8.1
python-debian          0.1.36ubuntu1
pytorch-lightning      0.7.3
pytz                   2020.1
PyYAML                 5.3.1
requests               2.22.0
requests-oauthlib      1.3.0
requests-unixsocket    0.2.0
resampy                0.2.2
rsa                    4.0
scikit-learn           0.23.1
scipy                  1.4.1
SecretStorage          2.3.1
service-identity       18.1.0
setuptools             45.2.0
simplejson             3.16.0
six                    1.14.0
SoundFile              0.10.3.post1
ssh-import-id          5.10
systemd-python         234
tensorboard            2.2.2
tensorboard-plugin-wit 1.6.0.post3
test-tube              0.7.5
threadpoolctl          2.1.0
torch                  1.4.0
torchvision            0.5.0
tqdm                   4.46.1
Twisted                18.9.0
typing-extensions      3.7.4.2
ubuntu-advantage-tools 20.3
ufw                    0.36
unattended-upgrades    0.1
Unidecode              1.1.1
urllib3                1.25.8
wadllib                1.3.3
Werkzeug               1.0.1
wheel                  0.34.2
zipp                   1.0.0
zope.interface         4.7.1

I am running the code on a GTX2080ti and have tried updating pytorch with different CUDA binaries but without luck.

Single Target Speaker

Does the number of speakers seem to affect the quality?

One-shot Conversion

Is this work on the one-shot voice conversion environment?

Pretrained-model yet?

Thank you for sharing this git.
I'm just wondering if the pre-trained model has yet to be uploaded.
Will it be uploaded soon?

How do I use the pretrained model for training Cotatron?

Thanks for sharing the pretrained model - but I can't get it working for training Cotatron.

When I try simply resuming Cotatron training with the full model using the following command:

python cotatron_trainer.py -c config/global/config.yaml config/cota/config.yaml \
                           -g 0 -n my_runname -p pretrained_decoder_libritts_vctk_epoch652_15388cc.ckpt

I get mismatched key errors. This is kinda expected, as cotatron is instantiated within the Synthesizer

Missing key(s) in state_dict: "encoder.embedding.weight", 
Unexpected key(s) in state_dict: "cotatron.encoder.embedding.weight

I can workaround this by using torch load/save to write out only the Cotatron part of the model...

checkpoint = torch.load('original.ckpt', map_location='cpu')
model = Synthesizer(hparams).cuda()
model.load_state_dict(checkpoint['state_dict'])

model.eval()

torch.save({
            'state_dict': model.cotatron.state_dict(),
            }, 'cotatron.ckpt')

But when I try training with this cotatron-specific checkpoint (i.e. 'cotatron.ckpt') pytorch-lightning complains with this error...

KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to "ModelCheckpoint.save_weights_only" being set to True.'

I've tried various other options for extracting and reusing Cotatron-specific parameters from the checkpoint, but it seems that pytorch-lightning doesn't support this.

Were you able to get this to work? If so, could you describe how it's done.

Thanks!

[report] I think this is a bug.

Hi, guys.

Thank you so much about sharing this code. And, I think I found a minor bug, so I am reporting it.

https://github.com/mindslab-ai/cotatron/blob/38079aa2c95d647ec915ec6e8102ae5653623b78/modules/tts_decoder.py#L64-L65

I think the prenet depth parameter must be hp.depth.prenet, not hp.depth.encoder, is it right?
Please check it.

Thanks,

Heejo

한국어 dataset 에 관한 recipe / korean dataset model

KSS 데이터셋으로 cotatron 학습을 하고 있는데 15kstep을 돌렸지만 alignment가 거의 잡히지 않습니다..
공개된 git code 만으로 KSS 데이터셋 학습이 가능한가요?
KSS 데이터셋 학습을 위한 recipe를 공개해주실 수 있으신가요?

I'm training cotatron with KSS dataset, and I've trained 15k steps, but I can barely get an alignment.
Is it possible to learn KSS dataset only with the released git code?
Could you share the recipe for learning KSS dataset?

Question on use WaveGlow instead of MelGan

Hello,
Want use WeveGlow since MelGan have a lot of sound metalic. I see config:

audio: # WARNING! This cannot be changed unlees you're planning to train the MelGAN vocoder by yourself.
  n_mel_channels: 80
  filter_length: 1024
  hop_length: 256
  win_length: 1024
  sampling_rate: 22050
  mel_fmin: 70.0
  mel_fmax: 8000.0

What need change to work with pre-trained WavGlow? I try use but I think have problem with MEL normalization since sound very noisy.

I know WavGlow use mel_fmin: 0.0, I modify and retrain but still not work.
Thanks you

ConfigAttributeError: Missing key mask_padding

When I run this cell from the colab,

with torch.no_grad():
mel_s_t, alignment, residual = model.inference(text_norm, mel_source, target_speaker)

this error occurs

ConfigAttributeError Traceback (most recent call last)
in ()
1 with torch.no_grad():
----> 2 mel_s_t, alignment, residual = model.inference(text_norm, mel_source, target_speaker)

10 frames
/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py in _get_node(self, key, validate_access, throw_on_missing_value, throw_on_missing_key)
468 if value is None:
469 if throw_on_missing_key:
--> 470 raise ConfigKeyError(f"Missing key {key}")
471 elif throw_on_missing_value and value._is_missing():
472 raise MissingMandatoryValue("Missing mandatory value: $KEY")

ConfigAttributeError: Missing key mask_padding
full_key: train.mask_padding
object_type=dict

Can youlet me know how to fix it?

maum-ai / cotatron Goto Github PK

cotatron's People

Contributors

Stargazers

Watchers

Forkers

cotatron's Issues

Recommend Projects

Recommend Topics

Recommend Org