Comments (10)
Hi, first of all good luck with the multispeaker implementation, I actually have played around with it (branch multispeaker, very old). Regarding your question, usually the restriction to mel_len is there in case one wants to do batched inference (to remove the paddings). For batch size=1 the lengths should match.
from forwardtacotron.
Oh. Your code extract aliment from tacotron by pass forward not by inference. That why we need get aliment matrix up to mel_len element for removing the padding value. When I replace your tacotron model, i thought aliment matrix must be extracted by running inference and i get the warning "Sum of durations did not match mel length". Now i know why that happen
Thank you for the explain
from forwardtacotron.
Can I let the issues open until i finish the implement ?
Thank sir
from forwardtacotron.
Surely, keep me updated!
from forwardtacotron.
Hi, i am successful implement Forward tacotron multispeaker version. The result sound good. Still we are using pretrained tacotron or with my version is mellotron to extract aliment between character and melspectrogram, but in fastpitch and fastspeech i see they now use montre to extract aliment. Have you tried it before ? will it better or worse than using tacotron to extract? I have tried using it in fastpitch but still the result is pool
from forwardtacotron.
Hi, I tried using the MFA for duration extraction before and I found it to be slightly worse, there was also quite some fiddling involved with mapping the phonemes. It wasn't totally bad though.
from forwardtacotron.
thank sir
one more question, now i am using grapheme for encoding text, since i have to work with multilingual data. Will use phoneme is better or not ? in fastpitch too, i saw they using both grapheme and phoneme
from forwardtacotron.
In my experience phonemes trump graphemes big time because of the bijective nature of phonemes that is easier to learn for the tts model. Result is much more stable pronunciation and prosody. For multilingual data simply phonemize your metafile upfront and set use_phonemes=False in the config. You could checkout https://github.com/as-ideas/DeepPhonemizer for a stable phonemizer (you might need to train your own phonemizer model for your needs, but its probably worth it)
from forwardtacotron.
thank you sir for your advise, since i use phoneme instead of graphemes then the result is promising. Now i would to deploy this model to device like mobile phone or camera, but the model have a big size so i must compression them, but the result is not good. Have you done it before ? what can i do if i want to deploy to device and still keep its quality?
Thank sir
from forwardtacotron.
do you think non auto-regressive model like forward tacotron or fast speech2 is not as good as auto-regressive model like tacotron2 ? when i try bot forward tacotron and fast speech on same dataset, i found that the audio generated is not good as result by tacotron2, even it outperform tacotron2 in speed. I am trying to improve your model and fast speech 2 for comparable to tacotron2 but it seem too hard to do this
from forwardtacotron.
Related Issues (20)
- symbols.py for Arabic letters
- Feature request: model compatible to export into onnx
- Cast error details: Unable to cast [Array] to Tensor HOT 9
- Adding pauses to the input text HOT 2
- preprocess.py issues - RAM usage close to 100% but CPU usage is nonexistant HOT 16
- ValueError not enough values to unpack (expected 2 got 0) HOT 2
- making the system available for use with assistive technologies on windows HOT 1
- Bad Alignment HOT 1
- ValueError: need at least one array to stack train_tacotron.py line 192 HOT 1
- Facing problem at preprocessing
- Need instructions for fine tunning
- Problems with attention for dataset consisting of longer samples
- how to train a dataset using a pre-trained model?
- preprocess.py misuses Espeak backend, resulting in slow performance and memory leak HOT 2
- preprocess.py: list index out of range HOT 5
- Multispeaker and new neural voice creation HOT 12
- Non-Latin alphabets
- Bad Attention!
- Training a model twice using a different dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from forwardtacotron.