Giter Site home page Giter Site logo

maum-ai / cotatron Goto Github PK

View Code? Open in Web Editor NEW
210.0 210.0 32.0 14.65 MB

Official code for Cotatron @ INTERSPEECH 2020

Home Page: https://mindslab-ai.github.io/cotatron

License: BSD 3-Clause "New" or "Revised" License

Python 98.97% Shell 1.03%
pytorch speech-synthesis voice-conversion

cotatron's People

Contributors

seonghoon-woo avatar seungwonpark avatar wookladin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cotatron's Issues

re-training n+1 speakers

Hello,
I tarin 5 speakers, then I need add speaker number 6, but can not use the pre-trained network trained on 5 speaker because get error number of embedding 6 but expecting 5.

How can fix this issue? dont want to train from scratch take long time.

Make train: raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')

I make install:
pip install -r requirements

Then run training and get:

Traceback (most recent call last):
  File "cotatron_trainer.py", line 75, in <module>
    main(args)
  File "cotatron_trainer.py", line 54, in main
    trainer.fit(model)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 602, in fit
    self.single_gpu_train(model)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 470, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 748, in run_pretrain_routine
    self.logger.log_hyperparams(ref_model.hparams)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loggers/base.py", line 18, in wrapped_fn
    fn(self, *args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loggers/tensorboard.py", line 113, in log_hyperparams
    exp, ssi, sei = hparams(params, {})
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/tensorboard/summary.py", line 156, in hparams
    raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')
ValueError: value should be one of int, float, str, bool, or torch.Tensor

Singe GPU or multi same error.
tried update tensorboard no make difference.

Adaptation for a corpus in Urdu

Dear team,
Thank you so much for this wonderful library.
I am creating a DB in Urdu. Can you please indicate:

  • How many speakers do I need in total? ( I have 8, 2 hours each)
  • How long should the length of each speaker be in hours?
  • How many epochs should the cota and synthesizer be trained?

Clarification regarding the configuration of the MelGAN model

Dear Team,

Thank you very much for your work on this project.
I am experimenting with your codebase and trying to synthesize the converted speech with a MelGan vocoder which I trained from scratch on a custom dataset. I am using the official implementation, which is available here:
https://github.com/descriptinc/melgan-neurips
The model is trained with the default parameters, except for the Mel frequencies, which I've set to mel_fmin=70, mel_fmax=8000, as described in the Cotatron paper.

Can you please confirm that this is the same MelGAN configuration, which you have used for training MelGAN on LibriTTS+VCTK?
If not, can you kindly describe the differences or point me to the correct GIT repo?

Many thanks.

pytorch_lightning.utilities.exceptions.MisconfigurationException

I am having issues with the cotatron training; in particular with the pytorch-lightning package. I've found that the code only runs with pytorch-lightning<=0.7.3, yet it is unable to identify my GPU:

  File "cotatron_trainer.py", line 72, in <module>
    main(args)
  File "cotatron_trainer.py", line 36, in main
    trainer = Trainer(
  File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 389, in __init__
    self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
  File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 629, in parse_gpu_ids
    gpus = sanitize_gpu_ids(gpus)
  File "/home/valleballe/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 596, in sanitize_gpu_ids
    raise MisconfigurationException(f"""
pytorch_lightning.utilities.exceptions.MisconfigurationException:
                You requested GPUs: [1]
                But your machine only has: []

I have the same result when setting the " -g" argument to "0".

My environment packages:

---------------------- -------------
absl-py                0.9.0
attrs                  19.3.0
audioread              2.1.8
Automat                0.8.0
blinker                1.4
cachetools             4.1.0
certifi                2019.11.28
cffi                   1.14.0
chardet                3.0.4
Click                  7.0
cloud-init             20.1
colorama               0.4.3
command-not-found      0.3
configobj              5.0.6
constantly             15.1.0
cryptography           2.8
cycler                 0.10.0
dbus-python            1.2.16
decorator              4.4.2
distro                 1.4.0
distro-info            0.23ubuntu1
entrypoints            0.3
future                 0.18.2
google-auth            1.16.0
google-auth-oauthlib   0.4.1
grpcio                 1.29.0
httplib2               0.14.0
hyperlink              19.0.0
idna                   2.8
imageio                2.8.0
importlib-metadata     1.5.0
incremental            16.10.1
inflect                4.1.0
Jinja2                 2.10.1
joblib                 0.15.1
jsonpatch              1.22
jsonpointer            2.0
jsonschema             3.2.0
keyring                18.0.1
kiwisolver             1.2.0
language-selector      0.1
launchpadlib           1.10.13
lazr.restfulclient     0.14.2
lazr.uri               1.0.3
librosa                0.7.2
llvmlite               0.32.1
Markdown               3.2.2
MarkupSafe             1.1.0
matplotlib             3.2.1
more-itertools         4.2.0
netifaces              0.10.4
numba                  0.49.1
numpy                  1.18.5
oauthlib               3.1.0
omegaconf              2.0.0
pandas                 1.0.4
Pillow                 7.1.2
pip                    20.0.2
protobuf               3.12.2
pyasn1                 0.4.2
pyasn1-modules         0.2.1
pycparser              2.20
PyGObject              3.36.0
PyHamcrest             1.9.0
PyJWT                  1.7.1
pymacaroons            0.13.0
PyNaCl                 1.3.0
pyOpenSSL              19.0.0
pyparsing              2.4.7
pyrsistent             0.15.5
pyserial               3.4
python-apt             2.0.0
python-dateutil        2.8.1
python-debian          0.1.36ubuntu1
pytorch-lightning      0.7.3
pytz                   2020.1
PyYAML                 5.3.1
requests               2.22.0
requests-oauthlib      1.3.0
requests-unixsocket    0.2.0
resampy                0.2.2
rsa                    4.0
scikit-learn           0.23.1
scipy                  1.4.1
SecretStorage          2.3.1
service-identity       18.1.0
setuptools             45.2.0
simplejson             3.16.0
six                    1.14.0
SoundFile              0.10.3.post1
ssh-import-id          5.10
systemd-python         234
tensorboard            2.2.2
tensorboard-plugin-wit 1.6.0.post3
test-tube              0.7.5
threadpoolctl          2.1.0
torch                  1.4.0
torchvision            0.5.0
tqdm                   4.46.1
Twisted                18.9.0
typing-extensions      3.7.4.2
ubuntu-advantage-tools 20.3
ufw                    0.36
unattended-upgrades    0.1
Unidecode              1.1.1
urllib3                1.25.8
wadllib                1.3.3
Werkzeug               1.0.1
wheel                  0.34.2
zipp                   1.0.0
zope.interface         4.7.1

I am running the code on a GTX2080ti and have tried updating pytorch with different CUDA binaries but without luck.

Pretrained-model yet?

Thank you for sharing this git.
I'm just wondering if the pre-trained model has yet to be uploaded.
Will it be uploaded soon?

How do I use the pretrained model for training Cotatron?

Thanks for sharing the pretrained model - but I can't get it working for training Cotatron.

When I try simply resuming Cotatron training with the full model using the following command:

python cotatron_trainer.py -c config/global/config.yaml config/cota/config.yaml \
                           -g 0 -n my_runname -p pretrained_decoder_libritts_vctk_epoch652_15388cc.ckpt

I get mismatched key errors. This is kinda expected, as cotatron is instantiated within the Synthesizer

Missing key(s) in state_dict: "encoder.embedding.weight", 
Unexpected key(s) in state_dict: "cotatron.encoder.embedding.weight

I can workaround this by using torch load/save to write out only the Cotatron part of the model...

checkpoint = torch.load('original.ckpt', map_location='cpu')
model = Synthesizer(hparams).cuda()
model.load_state_dict(checkpoint['state_dict'])

model.eval()

torch.save({
            'state_dict': model.cotatron.state_dict(),
            }, 'cotatron.ckpt')

But when I try training with this cotatron-specific checkpoint (i.e. 'cotatron.ckpt') pytorch-lightning complains with this error...

KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to "ModelCheckpoint.save_weights_only" being set to True.'

I've tried various other options for extracting and reusing Cotatron-specific parameters from the checkpoint, but it seems that pytorch-lightning doesn't support this.

Were you able to get this to work? If so, could you describe how it's done.

Thanks!

한국어 dataset 에 관한 recipe / korean dataset model

KSS 데이터셋으로 cotatron 학습을 하고 있는데 15kstep을 돌렸지만 alignment가 거의 잡히지 않습니다..
공개된 git code 만으로 KSS 데이터셋 학습이 가능한가요?
KSS 데이터셋 학습을 위한 recipe를 공개해주실 수 있으신가요?

I'm training cotatron with KSS dataset, and I've trained 15k steps, but I can barely get an alignment.
Is it possible to learn KSS dataset only with the released git code?
Could you share the recipe for learning KSS dataset?

Question on use WaveGlow instead of MelGan

Hello,
Want use WeveGlow since MelGan have a lot of sound metalic. I see config:

audio: # WARNING! This cannot be changed unlees you're planning to train the MelGAN vocoder by yourself.
  n_mel_channels: 80
  filter_length: 1024
  hop_length: 256
  win_length: 1024
  sampling_rate: 22050
  mel_fmin: 70.0
  mel_fmax: 8000.0

What need change to work with pre-trained WavGlow? I try use but I think have problem with MEL normalization since sound very noisy.

I know WavGlow use mel_fmin: 0.0, I modify and retrain but still not work.
Thanks you

ConfigAttributeError: Missing key mask_padding

When I run this cell from the colab,

with torch.no_grad():
mel_s_t, alignment, residual = model.inference(text_norm, mel_source, target_speaker)

this error occurs


ConfigAttributeError Traceback (most recent call last)
in ()
1 with torch.no_grad():
----> 2 mel_s_t, alignment, residual = model.inference(text_norm, mel_source, target_speaker)

10 frames
/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py in _get_node(self, key, validate_access, throw_on_missing_value, throw_on_missing_key)
468 if value is None:
469 if throw_on_missing_key:
--> 470 raise ConfigKeyError(f"Missing key {key}")
471 elif throw_on_missing_value and value._is_missing():
472 raise MissingMandatoryValue("Missing mandatory value: $KEY")

ConfigAttributeError: Missing key mask_padding
full_key: train.mask_padding
object_type=dict

Can youlet me know how to fix it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.