Giter Site home page Giter Site logo

utkarsh2299 / fastspeech2_hs Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.06 GB

Created this repo as a part of the project "Speech Technologies in Indian languages". About Indic TTS for Indian Languages: This is a project on developing text-to-speech (TTS) synthesis systems for Indian languages, improving quality of synthesis, as well as small foot print TTS integrated with disability aids and various other applications.

Home Page: https://www.iitm.ac.in/donlab/tts/index.php

License: Creative Commons Attribution 4.0 International

Jupyter Notebook 4.43% Python 11.44% Shell 6.63% TeX 0.08% Makefile 2.00% Perl 53.53% Scheme 2.14% Awk 2.51% C 3.73% Lex 1.53% Yacc 11.96%
espnet fastspeech2 hs hybrid-segment indic-languages tts

fastspeech2_hs's Introduction

Fastspeech2 Model using Hybrid Segmentation (HS)

Refer here: https://github.com/smtiitm/Fastspeech2_HS if any issues.

This repository contains a Fastspeech2 Model for 16 Indian languages (male and female both) implemented using the Hybrid Segmentation (HS) for speech synthesis. The model is capable of generating mel-spectrograms from text inputs and can be used to synthesize speech.

The Repo is large in size: We have used Git LFS due to Github's size constraint (please install latest git LFS from the link, we have provided the current one below).

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash
sudo apt-get install git-lfs
git lfs install

Language model files are uploaded using git LFS. so please use:

git lfs fetch --all
git lfs pull

to get the original files in your directory.

Model Files

The model for each language includes the following files:

  • config.yaml: Configuration file for the Fastspeech2 Model.
  • energy_stats.npz: Energy statistics for normalization during synthesis.
  • feats_stats.npz: Features statistics for normalization during synthesis.
  • feats_type: Features type information.
  • pitch_stats.npz: Pitch statistics for normalization during synthesis.
  • model.pth: Pre-trained Fastspeech2 model weights.

Installation

  1. Install Miniconda first. Create a conda environment using the provided environment.yml file:
conda env create -f environment.yml

2.Activate the conda environment (check inside environment.yaml file):

conda activate tts-hs-hifigan
  1. Install PyTorch separately (you can install the specific version based on your requirements):
conda install pytorch cudatoolkit
pip install torchaudio
pip install numpy==1.23.0

Vocoder

For generating WAV files from mel-spectrograms, you can use a vocoder of your choice. One popular option is the HIFIGAN vocoder (Clone this repo and put it in the current working directory). Please refer to the documentation of the vocoder you choose for installation and usage instructions.

(We have used the HIFIGAN vocoder and have provided Vocoder tuned on Aryan and Dravidian languages)

Usage

The directory paths are Relative. ( But if needed, Make changes to text_preprocess_for_inference.py and inference.py file, Update folder/file paths wherever required.)

Please give language/gender in small cases and sample text between quotes. Output argument is optional; the provide name will be used for the output file.

Use the inference file to synthesize speech from text inputs:

python inference.py --sample_text "Your input text here" --language <language> --gender <gender> --output_file <file_name.wav OR path/to/file_name.wav>

Example:

python inference.py --sample_text "श्रीलंका और पाकिस्तान में खेला जा रहा एशिया कप अब तक का सबसे विवादित टूर्नामेंट होता जा रहा है।" --language hindi --gender male --output_file male_hindi_output.wav

The file will be stored as male_hindi_output.wav and will be inside current working directory. If --output_file argument is not given it will be stored as <language>_<gender>_output.wav in the current working directory.

Citation

If you use this Fastspeech2 Model in your research or work, please consider citing:

“ COPYRIGHT 2023, Speech Technology Consortium, Bhashini, MeiTY and by Hema A Murthy & S Umesh, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING and ELECTRICAL ENGINEERING, IIT MADRAS. ALL RIGHTS RESERVED "

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

fastspeech2_hs's People

Contributors

utkarsh2299 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.