Giter Site home page Giter Site logo

warp-q's Introduction

WARP-Q Speech Quality Metric

This code is to run the WARP-Q speech quality metric.

https://github.com/WissamJassim/WARP-Q.git

WARP-Q (Quality Prediction For Generative Neural Speech Codecs) is an objective, full-reference metric for perceived speech quality. It uses a subsequence dynamic time warping (SDTW) algorithm as a similarity between a reference (original) and a test (degraded) speech signal to produce a raw quality score. It is designed to predict quality scores for speech signals processed by low bit rate speech coders.

General Description

Speech coding has been shown to achieve good speech quality using either waveform matching or parametric reconstruction. For very low bit rate streams, recently developed generative speech models can reconstruct high quality wideband speech from the bit streams of standard parametric encoders at less than 3 kb/s. Generative codecs produce high quality speech based on synthesising speech from a DNN and the parametric input.

The problem is that the existing objective speech quality models (e.g., ViSQOL, POLQA) cannot be used to accurately evaluate the quality of coded speech from generative models as they penalise based on signal differences not apparent in subjective listening test results. Motivated by this observation, we propose the WARP-Q metric, which is robust to low perceptual signal changes introduced by low bit rate neural vocoders. Figure 1 shows illustrates a block diagram of the proposed algorithm.

Figure 1: Blockgiagram of WARP-Q metric

The algorithm of WARP-Q metric consists of four processing stages:

  • Pre-processing: silent non-speech segments from reference and degraded signals are detected and removed using a voice activity detection (VAD) algorithm.

  • Feature extraction: Mel frequency cepstral coefficients (MFCCs) representations of the reference and degraded signals are first generated. The obtained MFCCs representations are then normalised so that they have the same segmental statistics (zero mean and unit variance) using the cepstral mean and variance normalisation (CMVN).

  • Similarity comparison: WARP-Q uses the SDTW algorithm to estimate the similarity between the reference degraded signals in the MFCC domain. It first divides the normalised MFCCs of the degraded signal into a number, , of patches. For each degraded patch , the SDTW algorithm then computes the accumulated alignment cost between and the reference MFCC matrix . The computation of accumulated alignment cost is based on an accumulated alignment cost matrix and its optimal path between and . Figure 2 shows an example of this stage. Further details are available in [1].

subSeqDTW.png
Figure 2: SDTW-based accumulated cost and optimal path between two signals. (a) plots of a reference signal and its corresponding coded version from a WaveNet coder at 6 kb/s (obtained from the VAD stage), (b) normalised MFCC matrices of the two signals, (c) plots of SDTW-based accumulated alignment cost matrix and its optimal path between the MFCC matrix of the reference signal and a patch extracted from the MFCC matrix of the degraded signal. The optimal indices ( and ) are also shown. corresponds to a short segment (2 s long) from the WaveNet signal (highlighted in green color).
  • Subsequence score aggregation: the final quality score is representd by a median value of all alignment costs.

An evaluation using waveform matching, parametric and generative neural vocoder based codecs as well as channel and environmental noise shows that WARP-Q has better correlation and codec quality ranking for novel codecs compared to traditional metrics as well as the versatility of capturing other types of degradations, such as additive noise and transmission channel degradations.

The results show that although WARP-Q is a simple model building on well established speech signal processing features and algorithms it solves the unmet need of a speech quality model that can be applied to generative neural codecs.

Requirements

Run using python 3.x and include these package dependencies in your virtual environment:

- pandas 
- librosa
- numpy 
- pyvad
- skimage
- speechpy
- soundfile
- scipy (optional)
- seaborn (optional, for plotting only)
- multiprocessing (optional, for parallel computing mode only)
- joblib (optional, for parallel computing mode only)

Run WARPQ_demo.py

Input:

- The main_test function calls a csv file that contains paths of audio files. 

- The csv file cosists of four columns: 

    - Ref_Wave: reference speech
    - Test_Wave: test speech
    - MOS: subjective score (optinal, for plotting only)
    - Codec: type of speech codec (optinal, for plotting only)

Output:

- Code will compute the WARP-Q quality scores between Ref_Wave and Test_Wave. 
- It will then store the obrained results in a new column in the same csv file.  

Papers for citation

Design of the WARP-Q algorithm is described in detail in the following paper:

[1] W. A. Jassim, J. Skoglund, M. Chinen, and A. Hines, β€œWARP-Q: Quality prediction for generative neural speech codecs,” paper accepted for presenatation at the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021). Date of acceptance: 30 Jan 2021

warp-q's People

Contributors

danielcastillac avatar wjassim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.