Giter Site home page Giter Site logo

adaspeech-adaptive-text-to-speech-for-custom-voice's Introduction

AdaSpeech 1 - PyTorch Implementation

This is a unofficial PyTorch implementation of Microsoft's text-to-speech system AdaSpeech: Adaptive Text to Speech for Custom Voice.

This project is based on ming024's implementation of FastSpeech. Feel free to use/modify the code.

Quickstart

Dependencies

  • Linux
  • Python 3.7+
  • PyTorch 1.10.1 or higher and CUDA

a. Create a conda virtual environment and activate it.

conda create -n adaspeech1 python=3.7
conda activate adaspeech1

b. Install PyTorch and torchvision following the official instructions

c. Clone this repository.

git clone https://github.com/yw0nam/Adaspeech/
cd Adaspeech

d. Install requirments.

pip install -r requirements.txt

Training

Datasets

The supported datasets are

  • LJSpeech: a single-speaker English dataset consists of 13,100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.
  • KSS: a single-speaker korean dataset 12,853 audio clips. it consists of audio files recorded by a professional female voice actress and their aligned text extracted from b
  • Kokoro: a single-speaker japanese dataset. It contains 43,253 short audio clips of a single speaker reading 14 novel books. The audio clips were split and transcripts were aligned automatically by Kokoro-Align.

You can train other datasets including multi-speaker datasets. But, you need to make textgrid yourself.

We take LJSpeech as an example hereafter.

Preprocessing

First, run

python3 prepare_align.py config/LJSpeech/preprocess.yaml

For make .lab and .wav file in raw_data folder

As described in the paper, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Alignments of the supported datasets are provided here. (Note, this is same file of ming024)

You can find other dataset's alignments here

You have to unzip the files in preprocessed_data/LJSpeech/TextGrid/

After that, run the preprocessing script by

python3 preprocess.py config/LJSpeech/preprocess.yaml

Alternately, you can align the corpus by yourself. Download the official MFA package and run

./montreal-forced-aligner/bin/mfa_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt english preprocessed_data/LJSpeech

or

./montreal-forced-aligner/bin/mfa_train_and_align raw_data/LJSpeech/ lexicon/librispeech-lexicon.txt preprocessed_data/LJSpeech

to align the corpus and then run the preprocessing script.

python3 preprocess.py config/LJSpeech/preprocess.yaml

Training

Training the model using below code.

CUDA_VISIBLE_DEVICES=0,1 python3 train_pl.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml

Infernce

Here is the pretrained model. you can inference by

python3 inference.py

I will release inference code using the reference audio and given text soon.

TensorBoard

Use

tensorboard --logdir output/log/LJSpeech/version_0/

to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms and audios are shown.

Note

The conditional layer normalization is implemented in this repository. But, adapting pre-trained models on new datasets haven't implemented.

References

adaspeech-adaptive-text-to-speech-for-custom-voice's People

Contributors

yw0nam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.