Giter Site home page Giter Site logo

x-lance / unicats-ctx-vec2wav Goto Github PK

View Code? Open in Web Editor NEW
112.0 10.0 16.0 1 MB

[AAAI 2024] Code for CTX-vec2wav in UniCATS

Home Page: https://cpdu.github.io/unicats/

Shell 4.29% Python 93.34% Perl 2.37%
speech-synthesis vocoder unicats vocoding semantic-token self-supervised-speech

unicats-ctx-vec2wav's People

Contributors

cantabile-kwok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unicats-ctx-vec2wav's Issues

Possible collaboration on CTXtxt2vec

Hi @cantabile-kwok, I’ve been chipping away on the unofficial implementation of the UniCATS paper here. Since the second part is out and it sounds like you’re working on the txt2vec portion of it, is there some possibility to collaborate on this? My unofficial repo contains some very basic dataset pre-processing and the different configs for establishing the contexts for each utterance.

Please do let me know. Thank you!

关于 prompt 梅尔谱的标准化

感谢大佬的开源!想请问可以分享一下 cmvn.ark 这个文件吗 🙏🏻🙏🏻🙏🏻
目前直接用没标准化的梅尔谱当 prompt,发音都很清晰,就是音色不太像,想看看标准化后的效果 🙏🏻🙏🏻🙏🏻
另外想确认下关于梅尔谱的参数:

prompt_wav, sr = librosa.load(prompt_src_wav_file, sr=16000)
prompt = logmelspectrogram(
    x=prompt_wav.T,
    fs=16000,
    n_mels=80,
    n_fft=1024,
    n_shift=160,
    win_length=465,
    window="hann",
    fmin=80,
    fmax=7600).squeeze()[None, :, :]
prompt = torch.FloatTensor(prompt)

是不是这样加载进来再标准化一下,就可以跟模型适配了

提取ppe

使用项目里的ppe提取脚本 跑出来的结果和项目提供的ppe特征对不上。还有这三个特征有没有做过消融实验呀?

Inference Speed

Hi @cantabile-kwok ,
I have also implemented UniCATS's vec2wav but that model is too slow, so I am curious to know the inference speed of this model. Actually, I am interested in integrating CTX-vec2wav with GPT-based AR txt2vec to create a fast prompt-based TTS model.

Also, do you have any plan to release CTX-txt2vec model anytime soon?
Thanks

关于 vq_codebook

您好,请问 vq_codebook 也是来自 vq-wav2vec-kmeans 吗?注意到代码中加载的是 (2, 320, 256),但是 vq-wav2vec-kmeans 中量化器的 embedding 的 shape 是 (320, 1, 256),两个 codebook 是一样的吗?

Use vec2wav for Speech to Speech Voice conversion

Hi @cantabile-kwok ,

I am curious to know have you tried this model for zero shot voice conversion use case ?
Idea is very simple:

Source voice speech -> semantic token -> vec2wav (with target voice prompt) -> Target voice speech

We can easily calculate semantic token from pretrained HuBert or VQ-wav2vec etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.