This repository aims to collect all available speech asessment models (with the github page) including speech quality and intelligibility prediction models.
We collect both human-based and objective-based speech quality prediction model.
The human-based speech quality prediction generally aims to predict mean opinion scores from several listeners. Nowadays, several benchmark datasets are interestingly designed for different speech processing applications; for example, in speech enhancement tasks, the speech assessment model aims to predict the quality of enhanced speech which has performed noise reduction to remove the noise parts.
-
Dataset: VCC2018
Available Model :
-
MOSNet: Deep Learning based Objective Assessment for Voice Conversion [code] [paper]
-
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network [code] [paper]
-
Utilizing Self-supervised Representations for MOS Prediction [code] [paper]
-
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [code] [paper]
-
-
Dataset: Blizzard Challenge
Available Model:
-
Dataset: VoiceMOS Challenge
Available Model:
-
Generalization Ability of MOS Prediction Networks [code] [paper]
-
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [code] [paper]
-
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]
-
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022 [code] [paper]
-
-
Dataset: TMHINT-QI
Available Model:
-
Dataset: NISQA Corpus
Available Model: NISQA:
-
Dataset: ConferencingSpeech2022 Datasets
Available Model:
The objective-based-speech quality prediction uses signal-processing-based quality metrics as the ground truth label, for example, Perceptual Evaluation of Speech Quality (PESQ).
-
Dataset: TIMIT Corpus
Available Model:
-
Dataset: WSJ Corpus
Available Model:
HASA-NET: A NON-INTRUSIVE HEARING-AID SPEECH ASSESSMENT NETWORK
-
Dataset: TIMIT Corpus
Available Model:
-
Dataset: TMHINT-QI
Available Model:
-
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [code] [paper]
-
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network [code] [paper]
-
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model [code] [paper]
-
-
Dataset: Clarity Challenge
- STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model [code] [paper]
We also create a repository on how to use those mentioned ASR systems to generate the transcript.