Awesome Speech Assessment Models

This repository aims to collect all available speech asessment models (with the github page) including speech quality and intelligibility prediction models.

Speech Quality Prediction Models

We collect both human-based and objective-based speech quality prediction model.

1. Human-based Speech Quality Prediction

The human-based speech quality prediction generally aims to predict mean opinion scores from several listeners. Nowadays, several benchmark datasets are interestingly designed for different speech processing applications; for example, in speech enhancement tasks, the speech assessment model aims to predict the quality of enhanced speech which has performed noise reduction to remove the noise parts.

A. Task: Voice Conversion/ Text-to-Speech

Dataset: VCC2018

Available Model :
- MOSNet: Deep Learning based Objective Assessment for Voice Conversion [code] [paper]
- MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network [code] [paper]
- Utilizing Self-supervised Representations for MOS Prediction [code] [paper]
- LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [code] [paper]
Dataset: Blizzard Challenge

Available Model:
- Deep Learning Based Assessment of Synthetic Speech Naturalness [code] [paper]
Dataset: VoiceMOS Challenge

Available Model:
- Generalization Ability of MOS Prediction Networks [code] [paper]
- LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [code] [paper]
- Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]
- UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022 [code] [paper]

B. Task: Speech Enhancement

Dataset: TMHINT-QI

Available Model:
- Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]
- InQSS: a speech intelligibility and quality assessment model using a multi-task learning network [code] [paper]

C. Task: Teleconference

Dataset: NISQA Corpus

Available Model: NISQA:
- A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets [code] [paper]
Dataset: ConferencingSpeech2022 Datasets

Available Model:
- ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications [code] [paper]

2. Objective-based Speech Quality Prediction

The objective-based-speech quality prediction uses signal-processing-based quality metrics as the ground truth label, for example, Perceptual Evaluation of Speech Quality (PESQ).

A. Task: Speech Enhancement

Dataset: TIMIT Corpus

Available Model:
- Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM [code] [paper]
Dataset: WSJ Corpus

Available Model:
- Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]

B. Task: Hearing Aids

HASA-NET: A NON-INTRUSIVE HEARING-AID SPEECH ASSESSMENT NETWORK

Dataset: TIMIT Corpus

Available Model:
- HASA-net: A non-intrusive hearing-aid speech assessment network [code] [paper]

Speech Intelligibility Prediction Models

1. Human-based Speech Intelligibility Prediction

A. Task: Speech Enhancement

Dataset: TMHINT-QI

Available Model:
- Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]
- InQSS: a speech intelligibility and quality assessment model using a multi-task learning network [code] [paper]
- MTI-Net: A Multi-Target Speech Intelligibility Prediction Model [code] [paper]

B. Task: Hearing Aids

Dataset: Clarity Challenge
- The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction [code] [paper]
- MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids [code] [paper]

2. Objective-based Speech Intelligibility Prediction

A. Task: Speech Enhancement

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model [code] [paper]

Open Source ASR Model

We also create a repository on how to use those mentioned ASR systems to generate the transcript.

dhimasryan / speechassessmentmodels Goto Github PK

speechassessmentmodels's Introduction

Awesome Speech Assessment Models

Speech Quality Prediction Models

1. Human-based Speech Quality Prediction

A. Task: Voice Conversion/ Text-to-Speech

B. Task: Speech Enhancement

C. Task: Teleconference

2. Objective-based Speech Quality Prediction

A. Task: Speech Enhancement

B. Task: Hearing Aids

Speech Intelligibility Prediction Models

1. Human-based Speech Intelligibility Prediction

A. Task: Speech Enhancement

B. Task: Hearing Aids

2. Objective-based Speech Intelligibility Prediction

A. Task: Speech Enhancement

Open Source ASR Model

speechassessmentmodels's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org