Giter Site home page Giter Site logo

disentintel's Introduction

Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Here should be an image visible.

Introduction

This is the code repository for the paper Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment, which can be found here, which was accepted for Interspeech2022.

It utalizes an existing architecture, originally desined with Voice Conversion in mind (SpeechSplit). However, for this work it is trained differently from the original implementation and only the encoder outputs are used to extract certain latent speech representations. These can then be used in a final step to determine a (pathological) speaker's intelligibility value based on a reference signal.

Please kindly cite our work if you find this repository useful:

@article{weise2022disentangled,
  title = {Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment},
  author = {Weise, Tobias and Klumpp, Philipp and Maier, Andreas and Noeth, Elmar and Heismann, Bjoern and Schuster, Maria and Yang, Seung Hee},
  journal = {arXiv preprint arXiv:2204.04016}
  url = {https://arxiv.org/abs/2204.04016},
  year = {2022},
}

Requirements

Refer to the requirements.txt file.

Data

In principle any dataset can be used for the (unsupervised in the context of Voice Conversion, i.e. without text transcriptions - however, speaker ID's are required) training of the SpeechSplit architecture (i.e., especially the encoders) of this approach. We created and used a subset of the English Common Voice corpora, in order to enable the encoders to learn general speech features. For more details please refer to the original paper. The custom dataset can be provided upon request if desired.

After chosing a desired dataset for training, some pre-processing steps (make_spect_f0.py and make_metadata.py) are required before starting the actual training process, as described in the original SpeechSplit repository.

For evaluating and predicting subjective intelligibility scores the UASpeech corpora was used.

Training

For training questions please refer to the original SpeechSplit repository - only hyperparameters (specifically padding) changed for the training process of this paper and the used data of course.

Inference

Inferencer class can be used for performing inference with the trained SpeechSplit model, in order to extract the three embeddings. If you want to use the weights that were trained as described in the paper, simply send us a request and we will provide them.

Intelligibility

For intelligibility functions/experiments please have a look at the intelligibility.py file and the functions/comments contained in it, in combination with the descriptions provided in the paper.

disentintel's People

Contributors

tobwei avatar

Stargazers

Helin Wang avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.