Giter Site home page Giter Site logo

adibian / persian_fastpitch Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 2.0 6.39 MB

Training FastPitch for Persian language as a Persian text-to-speech

Dockerfile 0.18% Roff 11.72% Python 78.84% Jupyter Notebook 5.27% Shell 4.00%
speech-synthesis fastpitch

persian_fastpitch's Introduction

Persian FastPitch

Training FastPitch for Persian language as a Persian text-to-speech. FastPitch is a TTS model that generates mel-spectrograms from text and is newer and faster than Tacotron. In this implementation we use FastPitch from Nvidia and change it to train this model for persian language. We clone Nvidia-FastPitch and install its requirements and then do following changes:

  1. Prepare persian data: many audio files and phonemes sequence for each file (we use phoneme instead of text because of using english characters and solving the problem of not writing some vowels in the Persian text)
  2. Edit fastpitch/data_function.py beacause of erroe in google colab. You can see this issue
  3. Edit cleaners.py in common/text/ according to used characters in phonemes
  4. Edit script/train.sh and train.py to change training parameters
  5. Edit scripts/inference_example.sh to change inferencing parameter

How to use

To use this implementation:

  1. Clone this repository
  2. Install requirements in requirments.txt
  3. Add your data: audio files to wavs/ and training and validating phoneme_transcriptions to filelists/ and testing phoneme_transcriptions to phrases/ as it is right now
  4. Run following command to extract pitch from your audio files and save files to wavs/pitch/:
python prepare_dataset.py \
     --wav-text-filelists filelists/audio_text_train.txt \
                          filelists/audio_text_val.txt \
     --n-workers 16 \
     --batch-size 1 \
     --dataset-path 'wavs/' \
     --extract-pitch \
     --f0-method pyin
  1. Run following command to install some dependencies:
git clone https://github.com/NVIDIA/apex
cd apex; pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
bash scripts/download_cmudict.sh
  1. Train the model on your data using following command. The checkpoints file will be in output/
bash scripts/train.sh
  1. Download WaveGlow to get audio from mel-spectrogram:
bash scripts/download_waveglow.sh
  1. Run following command to get result of test file that you put in phrase/ in step 3. The synthesized audio will be in output/audio_test_file/:
bash scripts/inference_example.sh

persian_fastpitch's People

Contributors

adibian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

persian_fastpitch's Issues

ReadMe

Hi
Do you have a readme to describe this project, what is it , what is it's deference

dataset

Hi!
I have a question about dataset. Suppose that I have several wavs and the corresponding text files of them (written in Persian language). How I can create phoneme_transcriptions of them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.