Giter Site home page Giter Site logo

speechassessmentmodels's Introduction

Awesome Speech Assessment Models

This repository aims to collect all available speech asessment models (with the github page) including speech quality and intelligibility prediction models.

Speech Quality Prediction Models

We collect both human-based and objective-based speech quality prediction model.

1. Human-based Speech Quality Prediction

The human-based speech quality prediction generally aims to predict mean opinion scores from several listeners. Nowadays, several benchmark datasets are interestingly designed for different speech processing applications; for example, in speech enhancement tasks, the speech assessment model aims to predict the quality of enhanced speech which has performed noise reduction to remove the noise parts.

A. Task: Voice Conversion/ Text-to-Speech

  • Dataset: VCC2018

    Available Model :

    • MOSNet: Deep Learning based Objective Assessment for Voice Conversion [code] [paper]

    • MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network [code] [paper]

    • Utilizing Self-supervised Representations for MOS Prediction [code] [paper]

    • LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [code] [paper]

  • Dataset: Blizzard Challenge

    Available Model:

    • Deep Learning Based Assessment of Synthetic Speech Naturalness [code] [paper]
  • Dataset: VoiceMOS Challenge

    Available Model:

    • Generalization Ability of MOS Prediction Networks [code] [paper]

    • LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [code] [paper]

    • Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]

    • UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022 [code] [paper]

B. Task: Speech Enhancement

  • Dataset: TMHINT-QI

    Available Model:

    • Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]

    • InQSS: a speech intelligibility and quality assessment model using a multi-task learning network [code] [paper]

C. Task: Teleconference

  • Dataset: NISQA Corpus

    Available Model: NISQA:

    • A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets [code] [paper]
  • Dataset: ConferencingSpeech2022 Datasets

    Available Model:

    • ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications [code] [paper]

2. Objective-based Speech Quality Prediction

The objective-based-speech quality prediction uses signal-processing-based quality metrics as the ground truth label, for example, Perceptual Evaluation of Speech Quality (PESQ).

A. Task: Speech Enhancement

  • Dataset: TIMIT Corpus

    Available Model:

    • Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM [code] [paper]
  • Dataset: WSJ Corpus

    Available Model:

    • Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]

B. Task: Hearing Aids

HASA-NET: A NON-INTRUSIVE HEARING-AID SPEECH ASSESSMENT NETWORK

  • Dataset: TIMIT Corpus

    Available Model:

    • HASA-net: A non-intrusive hearing-aid speech assessment network [code] [paper]

Speech Intelligibility Prediction Models

1. Human-based Speech Intelligibility Prediction

A. Task: Speech Enhancement

  • Dataset: TMHINT-QI

    Available Model:

    • Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]

    • InQSS: a speech intelligibility and quality assessment model using a multi-task learning network [code] [paper]

    • MTI-Net: A Multi-Target Speech Intelligibility Prediction Model [code] [paper]

B. Task: Hearing Aids

  • Dataset: Clarity Challenge

    • The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction [code] [paper]

    • MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids [code] [paper]

2. Objective-based Speech Intelligibility Prediction

A. Task: Speech Enhancement

  • STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model [code] [paper]

Open Source ASR Model

We also create a repository on how to use those mentioned ASR systems to generate the transcript.

speechassessmentmodels's People

Contributors

dhimasryan avatar

Stargazers

 avatar Li Yang avatar  avatar Devansh Khandekar avatar  avatar  avatar RoyChao avatar Atabak Pouya avatar Dyah Ayu M. G. Wisnu avatar

Watchers

 avatar

Forkers

runngezhang-jx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.