Persian FastPitch

Training FastPitch for Persian language as a Persian text-to-speech. FastPitch is a TTS model that generates mel-spectrograms from text and is newer and faster than Tacotron. In this implementation we use FastPitch from Nvidia and change it to train this model for persian language. We clone Nvidia-FastPitch and install its requirements and then do following changes:

Prepare persian data: many audio files and phonemes sequence for each file (we use phoneme instead of text because of using english characters and solving the problem of not writing some vowels in the Persian text)
Edit fastpitch/data_function.py beacause of erroe in google colab. You can see this issue
Edit cleaners.py in common/text/ according to used characters in phonemes
Edit script/train.sh and train.py to change training parameters
Edit scripts/inference_example.sh to change inferencing parameter

How to use

To use this implementation:

Clone this repository
Install requirements in requirments.txt
Add your data: audio files to wavs/ and training and validating phoneme_transcriptions to filelists/ and testing phoneme_transcriptions to phrases/ as it is right now
Run following command to extract pitch from your audio files and save files to wavs/pitch/:

python prepare_dataset.py \
     --wav-text-filelists filelists/audio_text_train.txt \
                          filelists/audio_text_val.txt \
     --n-workers 16 \
     --batch-size 1 \
     --dataset-path 'wavs/' \
     --extract-pitch \
     --f0-method pyin

Run following command to install some dependencies:

git clone https://github.com/NVIDIA/apex
cd apex; pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
bash scripts/download_cmudict.sh

Train the model on your data using following command. The checkpoints file will be in output/

bash scripts/train.sh

Download WaveGlow to get audio from mel-spectrogram:

bash scripts/download_waveglow.sh

Run following command to get result of test file that you put in phrase/ in step 3. The synthesized audio will be in output/audio_test_file/:

bash scripts/inference_example.sh

adibian / persian_fastpitch Goto Github PK

persian_fastpitch's Introduction

Persian FastPitch

How to use

persian_fastpitch's People

Contributors

Stargazers

Watchers

Forkers

persian_fastpitch's Issues

ReadMe

dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent