Comments (12)
i solved it by changing util / audio.py / save_wav() :
librosa.output.write_wav(path, wav.astype(np.int16), hparams.sample_rate)
to
librosa.output.write_wav(path, wav, hparams.sample_rate)
from gst-tacotron.
@ishandutta2007 Usually, google tends to use a lot of GPUs to train such a model. And they use about 200 hours data to get their performance. So I think it's hard to reconstruct their performance. By the way, one of my friend begins to train a model use this repo, I can share it when it's finished.
from gst-tacotron.
@ishandutta2007 Hi, I guess it is caused by the librosa version. You can modify how to write wave with your environment.
from gst-tacotron.
Thanks a lot @syang1993 for answering, I have been trying to reach out to you on multiple platforms for help on this thread for models that people have already built. Not sure people look at older threads. It would be great if you could share atleast the 200k steps(that you shared outputs of) model for us to continue more iterations on top of that.
from gst-tacotron.
Hi, I'm so sorry that I'm now doing an internship in a company, I cannot get the pre-trained model (I trained it several months ago when I was doing visiting research in Singapore). You can train it by yourself, it may take about 3 days to get 200K steps.
from gst-tacotron.
Well on our gtx 1080 as per my estimate it's taking longer(maybe twice of that). And it is also not always about time, now a days people in the ML world are burning huge amount of compute hours and money unnecessary when sharing can solve it a lot.
Can you share your email/linkedin/twitter etc , you seem to be really deep into Speech Synthesis, keeping in touch may be useful for both of us.
from gst-tacotron.
I trained it on P40, which may be faster. Yes you are right, sharing can solve a lot. Maybe this is the purpose of Github :)
I'm not so familiar with linkedin so that I don't know how to share my id, this is the link https://www.linkedin.com/in/yang-shan-182987119
from gst-tacotron.
So in China do you use Ushi or Mamai ? Let's see if I can connect via them too. :)
from gst-tacotron.
Thanks @syang1993 for the connect. I have triggered the run on our gtx 1080, it would take a month or so to get 500-600 iterations. We need to get it right close to google's performance or else it is unusable for real life scenarios. If you have access to more powerful GPUs, It would be a great favour if you could do a train for larger iterations and share the model with the community. Till now there is no properly trained tacotron with style transfer on internet, this will be the first one.
from gst-tacotron.
No wonder why Elon Musk fears of Google colonising the world :D
@syang1993 what's the best communicator/instant messager to keep in touch with you, we shouldn't be discussing stuff not related to the thread, I will switch over to the models thread for further updates on this.
Do let me know what's best to reach you. Don't hesitate even if I need to install wechat or something. In India we use:
- google hangout/gmail: [email protected]
- whatsapp: +919952917263
- twitter: https://twitter.com/ishandutta2007
- facebook: https://facebook.com/ishandutta2007
- and linkedin already exchanged above.
from gst-tacotron.
@ishandutta2007 We mostly use wechat in China, and my wechat id is ys_think . I also use linkedin (not usual) and gmail: [email protected]
from gst-tacotron.
just hit the same error. @ishandutta2007 how did you get around this?
from gst-tacotron.
Related Issues (20)
- GMM Attention HOT 5
- No clear speech HOT 7
- Some problems when preprocessing ljspeech dataset HOT 1
- Reference Encoder Padding
- where do you insert or import wav file of models voice for training?
- Why use the 'tf.layer.conv1d' for query, key transformation instead of fully connected layer?
- Error in datafeeder.py HOT 1
- Path for Reference Audio HOT 1
- erro in eval.py HOT 1
- Check failed: dnnReLUCreateBackward_F32 HOT 1
- can we synthesis speaker-A's tone with speaker-B's prosody?
- What is in reference audio path?
- Pretrained Weights HOT 1
- Unable to reproduce results
- Mumbling in synthesis HOT 1
- Regarding the trained model
- Using pre-trained model of Keithito's tacotron implementation
- Add style weights when there is no reference audio
- shape of linear_outputs is not same as while training
- training stops many seconds to create new queue of data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gst-tacotron.