zhenye234 / comospeech Goto Github PK

View Code? Open in Web Editor NEW

170.0 11.0 17.0 2.42 MB

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

License: MIT License

Python 98.70% Cython 1.30%

consistency-models diffusion-models speech text-to-speech tts speech-synthesis

comospeech's People

Contributors

Stargazers

Watchers

Forkers

maxmax2016 ishine shaun95 pgolds splinter21 positivewon eschmidbauer fengshi-cherish amorjnyh pan310 konstantinegoudz jiahong3837 chenchy whitefu mcolletta nzpeng saulocatharino

comospeech's Issues

The paper mentions using CoMoSpeech for the SVS task and briefly describes feature extraction, but I can't find that feature extraction or summing with the phoneme features in the code. Is that planned for release in the future?

Thanks!

作者你好，请问论文中的Eq.(7)是如何推导的？有点没读懂

作者你好，读论文看Eq.(7)还是没懂是如何推导出来的，drift coefficient为f(t)=t，diffusion coefficient为g(t)=1，那么\sigma(t)=t，然后再结合Eq.(4)，请问是带入到Eq.(3)这个ODE吗？我自己推了下还是没得到Eq.(7)。不知道是我哪里理解有误？谢谢！

什么时候开源代码？

如题

pretrained models available

Hello, thank you for sharing this project. I was hoping to test it out for inference and fine-tuning but could not see any pretrained models available to do so, would it be possible to share an english model?

Is singing voice conversion not available?

Is text-to-speech the only feature available right now?

Generate audio with distinct electric sound, when train on Mandarin custom datasets

Hi, I replace zh pinyin phonemes and finish the network for multi speakers (without feed spk_emb to GradLogPEstimator2d), when I train teacher model on custom datasets with 17 speakers (each speakers 6000 wavs) about 108 epochs, the teacher model generate audio still contain distinct electric sound, is that normal?
Moreover, I change network to feed spk_emb to GradLogPEstimator2d and find the network can quickly reconstruct the timbre of different speakers (only about 2 epoch), but still contain distinct electric sound, how can I work out for it? or just wait more epochs?
Looking forward for your kind reply

zhenye234 / comospeech Goto Github PK

comospeech's People

Contributors

Stargazers

Watchers

Forkers

comospeech's Issues

你好，请问歌声合成什么时候能发布

singing voice synthesis

作者你好，请问论文中的Eq.(7)是如何推导的？有点没读懂

什么时候开源代码？

pretrained models available

Is singing voice conversion not available?

Generate audio with distinct electric sound, when train on Mandarin custom datasets

FlashSpeech会开源吗？

Does singing voice work on English? songs or not?

你好，请问歌声合成的教师模型使用什么模型？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent