nii-yamagishilab / multi-speaker-tacotron Goto Github PK

View Code? Open in Web Editor NEW

264.0 264.0 42.0 2.69 MB

VCTK multi-speaker tacotron for ICASSP 2020

License: BSD 3-Clause "New" or "Revised" License

Python 60.87% Shell 39.13%

multi-speaker-tacotron's People

Contributors

Stargazers

Watchers

Forkers

meelement templeblock entn-at mtmvideo ttslr silyfox beckgom wgwangang rhoposit titiaffandi charlottecuc mueller91 appalachianwine agangzz chenchy chunhuiwang-china leixiaoning hlp2819 greboide narrationbox labmem-zhouyx i-math crazycharles6 epochsimate car3936 zeta1999 sciai-ai tsaifangsheng smilemcm tiamat-tech nicholascelestin epic zhaoyun630 techthiyanes zhazhafon tmisirpash isgursoy alexuan kunthet

multi-speaker-tacotron's Issues

problem downloading data from Dropbox

Hello, I have tried downloading the data from the dropbox link you provided (https://www.dropbox.com/sh/rq4lebus0n8tmso/AACldbmKDPRN9YiXrRROjtTSa?dl=0). But unfortunately, I am getting the "Folder has too many files to download" error. I have also tried using "wget https://www.dropbox.com/sh/rq4lebus0n8tmso/AACldbmKDPRN9YiXrRROjtTSa?dl=1" command but I was still not able to download the data. I would be really grateful if you could provide a link to the dropbox which can be downloaded through wget command, or provide a zip version of the data folder. Thank you very much for the help.

Dataset

Can you provide examples of runtime in your code, including all txt and wav.
I can not run your example with the path of /gs/hs0/tgh-19IAA/ecooper/data/vctk0.91-preprocessed-phone/source and target-data-root as well, and --hparam-json-file=/gs/hs0/tgh-19IAA/ecooper/data/vctk_hpf_sv56_preprocess_30db/target/hparams.json in scripts folder.Waiting for your letter.

Generating LDE embeddings.

Is there a way to generate the LDE embeddings for new samples that I have?

inference speed

Hi.
I am exploring about speed of training and inference different multi speaker TTS models on single CPU or on singe GPU.
Thanks for any explanation in this case for current model or any other models of multi speaker TTS.

Guide to how to synthesis my own voice

Hi,

Thanks for the great work and I am interested in synthesis my own voice using your system.

Can you provide a guide on how to do that?

In particular, I can't find the pre-trained model to generate new speaker embedding (pytorch-kaldi-neural-speaker-embeddings repo didn't share the pretrained model). Would you mind provide the embedding model you used?

Nancy model for parameter initialization

Thank you very much for providing the source code. May I ask when the nancy model for parameter initialization will be available? Thanks.

When will the code be published?

Dear authors,

The paper is very interesting. Any news on when the code will be published?

Clarification of this repo vs Byeon's repo

It would be great if this repo's ReadME can clear the air regarding this repo and https://github.com/GSByeon/multi-speaker-tacotron-tensorflow (only Korean presentation) both claiming to be THE "Multi-Speaker Tacotron".

Can we get a cloned voicie in Real Time ?

Hello,

I have 2 quick questions about what can be done using Tacotron.

What is the minimum training time (in minutes) required to have a good result ?
Can the processing time (after training data) be instantaneous ? I mean if we can get the cloned voice in real time...
Happy new year by the way !

Thank you !

My Additive-Attention is not good

The Additive-Attention has always been bad.
According your paper, this Attention helps Forward Attention to align, so is it normal that always bad?
My Forward Attention is well aligned.

I use all of the VCTK, and the batch_size is 32.
The figure below shows the number of epochs 37, 47, and 48.

Feature generation for given text

Hi,

Great work you did. I have a question. Can you provide a script to extract features for a given text that can be used as input for the predictmel script (provided one has the phonemes from flite). I couldnt get it to work (normally i dont use tensorflow). I also wonder how to get from the phonemes to the token ids. Is that part of the feature extraction script?

Best,
Christian

nii-yamagishilab / multi-speaker-tacotron Goto Github PK

multi-speaker-tacotron's People

Contributors

Stargazers

Watchers

Forkers

multi-speaker-tacotron's Issues

problem downloading data from Dropbox

Dataset

Generating LDE embeddings.

inference speed

Guide to how to synthesis my own voice

Nancy model for parameter initialization

When will the code be published?

Clarification of this repo vs Byeon's repo

Can we get a cloned voicie in Real Time ?

My Additive-Attention is not good

Feature generation for given text

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent