natspeech,github

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

This repo contains official PyTorch implementation of:

PortaSpeech: Portable and High-Quality Generative Text-to-Speech (NeurIPS 2021)
Demo page | HuggingFace🤗 Demo
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (DiffSpeech) (AAAI 2022)
Demo page | Project page | HuggingFace🤗 Demo

Key Features

We implement the following features in this framework:

Data processing for non-autoregressive Text-to-Speech using Montreal Forced Aligner.
Convenient and scalable framework for training and inference.
Simple but efficient random-access dataset implementation.

Install Dependencies

## We tested on Linux/Ubuntu 18.04. 
## Install Python 3.6+ first (Anaconda recommended).

export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool

Documents

Citation

If you find this useful for your research, please cite the following papers:

PortaSpeech

@article{ren2021portaspeech,
  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

DiffSpeech

@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}
 }

Acknowledgments

Our codes are influenced by the following repos:

License and Agreement

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

natspeech Goto Github PK

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Key Features

Install Dependencies

Documents

Citation

Acknowledgments

License and Agreement

natspeech's Projects

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent