hamedhemati / tacotron-2-persian Goto Github PK

Tacotron 2 - Persian

Python 95.84% Jupyter Notebook 2.21% Scheme 0.84% Shell 1.11%

tacotron-2-persian's Introduction

Tacotron 2 - Persian

Visit this demo page to listen to some audio samples

This repository contains implementation of a Persian Tacotron model in PyTorch with a dataset preprocessor for the Common Voice dataset. For generating better quality audios, the acoustic features (mel-spectrogram) are fed to a WaveRNN model. I've included WaveRNN model in the code only for infernece purposes (no trainer included).

The source code in this repository is highly inspired by and partially copied (and also modified) from the following repostories:

Tacotron model: https://github.com/espnet/espnet
WaveRNN and some utils: https://github.com/mozilla/TTS

Model specs:

Encoder : CNN layers with batch-norm and a bi-directional lstm on top.
Decoder: 2 LSTMs for the recurrent part and a post-net on top.
Attention type: GMM v2 with k=25.

Datasets

The model is trained on audio files from one of the speakers in Common Voice Persian which can be downloaded from the link below: https://commonvoice.mozilla.org/en/datasets

Unfortunately, only a few number of speakers in the dataset have enough number of utterances for training a Tacotron model and most of the audio files have low quality and are noisy. I found audio files from one of the speakers more approriate for training whose speaker id is hard-coded in the commonvoice_fa preprocessor.

Data preprocessing

After downloading the dataset, first set DATASET_PATH and DATASET_PATH variables in the file scripts/preprocess_commonvoice_fa/preprocess_commonvoice_fa.sh and then run:

sh scripts/preprocess_commonvoice_fa/preprocess_commonvoice_fa.sh

This will extract features required for training the model and will create a meta file that contains transcripts and phonemization of each transcript in individual lines along with other meta info.
Finally, you will need to create two files named metadata_train.txt and metadata_eval.txt out of metadata.txt. First get number of lines in the transcript file with wc -l metadata.txt, then if for example there is 10000 lines in the metadata file you can split it with as below:

shuf metadata.txt >> metadata_shuf.txt
head -n 9000 metadata_shuf.txt > metadata_train.txt
tail -n 1000 metadata_shuf.txt > metadata_eval.txt

Experiments

Each experiment consists of three files namely train.sh, generate.sh and config.yml.

Training

All training parameters and model hyperparameters are saved in a YAML config file under the folder created for each experiment. To train the model for Common Voice Persian you can first change the parameters in scripts/tacotron2persian_commonvoice_fa/config.yml and then simply run:

sh scripts/tacotron2persian_commonvoice_fa/train.sh

Inference

Once the model training is finished, you can generate audio after setting the variables in generate.sh as below:

sh scripts/tacotron2persian_commonvoice_fa/generate.sh

Demo

To only test the model, I've provided a demo script folder that also contains the checkpoints for Tacotron and WaveRNN. Download the demo experiment zip from:
https://drive.google.com/file/d/1wzlBGgROS76W4xOY1UpCZaW5zropyLVL/view?usp=sharing
and unzip it under ./scripts/ folder. Similar to what explained in the Inference section, you just need to run:

sh scripts/demo/generate.sh

Example outputs

Below you see the generated mel-spectrogram and attention matrix for the setence:

صاحب قصر زنی بود با لباس‌هایِ بسیار مجلل و خدم و حشم فراوان که به گرمی از مسافرین پذیرایی کرد.

Obviously, the quality of generated mel-spectrogram/audio is lower than those generated from a standard model trained on English or Japanese languages, which is mainly caused by the quality of the audios in the dataset.

WaveRNN

The WaveRNN checkpoint used for generating the sample audios is trained on LibriTTS with 22K sampling rate. It can reconstruct waveforms for any female/male voice with an acceptable quality. Give it a try on your own voice ;)

tacotron-2-persian's People

Contributors

Stargazers

Watchers

Forkers

myousefi62 smirzaei so-ai-love alirezaomidi mahdiesrafili shamsnaamir ethereumage miladhtb pouriaomrani alireza-14 rezalahmi

tacotron-2-persian's Issues

[Request] Make training process available in form of a notebook

Greetings Dear Hamed.
I was looking for a reliable and stable Persian TTS software, and I found your repository here. Now, I open this issue to make a few requests.

Making training/inference process available in form of a colab notebook
A quick guide on dataset
A quick guide on how to make an inference program for your TTS

Regards.

About using the tacotron2 for Persian langue

I have seen this project via medium and GitHub site, which is using the tacotron2 for Persian language, but its codes are not completed and don't work, so I tried to combine those but this project and made this colab page.

So as I am a newbie with tacotron2, I asked here to understand better what can I do to make that codes to working for Persian language too.

Thanks for your consideration.

About debugging the text-to-speech tacotron2 for Persian language (missed validated.tsv)

I am trying to debug one text-to-speech tactron2 project in GitHub for Persian language, and edited its google colaratorary page at here, in the part of testing its demo part:

enter image description here

To only test the model, I've provided a demo script folder that also
contains the checkpoints for Tacotron and WaveRNN. Download the demo
experiment zip from:
https://drive.google.com/file/d/1wzlBGgROS76W4xOY1UpCZaW5zropyLVL/view?usp=sharing
and unzip it under ./scripts/ folder. Similar to what explained in the
Inference section, you just need to run:

sh scripts/demo/generate.sh Example outputs Below you see the
generated mel-spectrogram and attention matrix for the setence:

Obviously, the quality of generated mel-spectrogram/audio is lower
than those generated from a standard model trained on English or
Japanese languages, which is mainly caused by the quality of the
audios in the dataset.

So I have got the below error in the running the !sh /content/Tacotron-2-Persian/scripts/preprocess_commonvoice_fa/preprocess_commonvoice_fa.sh part of the Google colab page:

/content/Tacotron-2-Persian/scripts/preprocess_commonvoice_fa
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/Tacotron-2-Persian/tac2persian/data_preprocessing/preprocess_commonvoice_fa.py", line 111, in <module>
    preprocess(args.dataset_path, args.output_path, target_speakers, config, args.num_workers)
  File "/content/Tacotron-2-Persian/tac2persian/data_preprocessing/preprocess_commonvoice_fa.py", line 49, in preprocess
    with open(os.path.join(dataset_path, "validated.tsv"), "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/Tacotron-2-Persian/scripts/demo/validated.tsv'

Also, the content of preprocess_commonvoice_fa.sh file is shown below:

#! /bin/bash

# Get full path to the config file automatically
full_path=$0
CONFIG_PATH=$(dirname "$full_path")
echo $CONFIG_PATH


DATASET_PATH="PATH TO DATASET"
OUTPUT_PATH="PATH TO THE OUTPUT DIRECTORY"
NUM_WORKERS=5

python -m tac2persian.data_preprocessing.preprocess_commonvoice_fa --dataset_path="$DATASET_PATH" \
                                                                   --output_path="$OUTPUT_PATH" \

Also asked below:

https://stackoverflow.com/questions/69934192/about-debugging-the-text-to-speech-tacotron2-for-persian-language-missed-valida

Thanks.

validate.tsv

I was making the metadata file (the first step) but it asked me about validate.tsv file.
From where I can find it?
@HamedHemati

Problem in generate.sh

Hello @HamedHemati
When I tried to run the generate.sh file. I faced this error:

/usr/bin/python3: Error while finding module specification for 'TTS.acoustic_model.tacotron2.generate' (ModuleNotFoundError: No module named 'TTS.acoustic_model')

After that, I changed generate.sh like the below, to call generate.py:

#! /bin/bash

# get path to config file
full_path=$0
CONFIG_PATH=$(dirname "$full_path")
echo $CONFIG_PATH


CHECKPOINT_ID="100"

SPEAKER=""
LANGUAGE="fa"

INP_TEXT="دلا نزدِ کسی بنشین که او از دل خبر دارد."

#tac2persian.train
#python -m TTS.acoustic_model.
python -m tac2persian.generate --tacotron_config_path="/content/drive/MyDrive/TTS/Tacotron-2-Persian/scripts/tacotron2persian_commonvoice_fa/config.yml" \
                               --wavernn_config_path="/content/drive/MyDrive/TTS/Tacotron-2-Persian/scripts/preprocess_commonvoice_fa/config.yml" \
                               --inp_text="سلام" --tacotron_checkpoint_path="/content/drive/MyDrive/TTS/Tacotron-2-Persian/outputs/commonvoice_fa/checkpoints/checkpoint_0K.pt"
                                                #--checkpoint_id="$CHECKPOINT_ID" \
                                                #--speaker="$SPEAKER" \
                                               # --language="$LANGUAGE" \

But there were some flags such as tactron_checkpoint_path and wavernn_checkpoint_path that I didn't know how to fill.
After some searching, I changed the file as upper. But after running the project there were some missing keys and values in the config.yml files. I've struggled a lot but I couldn't solve the problem.

I've would be very happy if you would help me to fix this problem.
thank you
regards

Error in processing.

Thank you for your guide for the .tar file.
But I've got another error:

What should I do?

training time length

Hi and thank you for your codes. How long does it take?(hours)

how train and set dataset and set ./wav

Please provide a complete explanation of how to train
!sh /content/Tacotron-2-Persian/scripts/tacotron2persian_commonvoice_fa/train.sh

in set dataset path
/content/Tacotron-2-Persian/scripts/tacotron2persian_commonvoice_fa/config.yml
datasets:
commonvoice_fa:
dataset_path: "PATH TO PRE-PROCESSED DATASET"
max_mel_len: 1000
train_metafile: "metadata_train.txt"
eval_metafile: "metadata_train.txt"
speakers_list: ["speaker_fa_0"]

Please help me step to step for train

sh scripts/preprocess/preprocess_commonvoice_fa.sh

I got only 20 lines in metadata.txt, after reading the code, I found out that in file tac2persian/data_preprocessing/preprocess_commonvoice_fa.py:

for itr, line in enumerate(lines[:20]):

you have limited the lines which causes the problem. What is this limiting for?
thanks you again.

hamedhemati / tacotron-2-persian Goto Github PK

tacotron-2-persian's Introduction

Tacotron 2 - Persian

Model specs:

Datasets

Data preprocessing

Experiments

Training

Inference

Demo

Example outputs

WaveRNN

tacotron-2-persian's People

Contributors

Stargazers

Watchers

Forkers

tacotron-2-persian's Issues

Recommend Projects

Recommend Topics

Recommend Org