Giter Site home page Giter Site logo

ldm_speech_synthesis's Introduction

LDM speech synthesis

Speech synthesis using Linear Dynamical Models (LDMs).

The task is broken down into two components:

  1. Preparing the dataset: Given a dataset which consists of audio utterances along with their transcriptions, we process the data to be used for training our speech synthesis models. Merlin's front end is a great tool for preprocessing the data.
  2. Training the LDMs using the processed data.

Tools required

We used Nick corpus for Hurricane Challenge as our dataset. In order to use this dataset, you need to accept the license and obtain a password first. We use the 'plain' (not lombard) news sentences (named herald_xxx) and the 'plain' Harvard sentences (hvd_xx) of this corpus. However, there are other open-source datasets available which can be used here as well. If you want a dataset without any license issues, CMU SLT arctic dataset would be good starting point. It is however a relatively small dataset. LJ speech is a pretty large dataset for TTS. To prepare the dataset the following tools are used:

  1. Festival with unilex lexicon : We use festival as a front-end to get phonetic transcriptions from text. Since the speaker for the dataset is an English speaker, we use unilex lexcion to transcribe rather than the default CMU lexicon. This lexicon is again available only after a license.
  2. HMM Toolkit (HTK) : We use HTK to align the phonetic transcriptions with their position in the audio utterance. You might encounter a bug in the source code that needs to be fixed.
  3. speech signal processing toolkit (SPTK) : We use SPTK for signal processing of audio data.
  4. World vocoder: We use World vocoder to create the output acoustic features. In order to do analysis and synthesis of utterances, we use the binaries compiled in Merlin. After compiling the code provided there, we end up with two binaries analysis and synth which perform those two tasks respectively.

The LDMs are trained using Matlab

The front-end preprocessing can be done using Merlin, so we do not duplicate that here. The steps to preprocessing can be understood using this tutorial. The sample_voice directory contains one example file from nick voice which has been processed to extract all the required features so that you may have a fair idea what do all the features look like. The LDM-TTS code is in ldm_tts directory. Please refer their individual README.md files in both the subdirectories..

To evalute the the generated speech, we use PESQ measure.

Author

Gagandeep Singh

ldm_speech_synthesis's People

Contributors

gaganbahga avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.