Giter Site home page Giter Site logo

cocii / tssdnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ghua-ac/end-to-end-synthetic-speech-detection

0.0 0.0 0.0 5.14 MB

Time-domain synthetic speech detection net (TSSDNet), having the classic ResNet and Inception Net style structures (Res-TSSDNet and Inc-TSSDNet), for end-to-end synthetic speech detection. They achieve the state-of-the-art performance in terms of EER on ASVspoof 2019 challenge and promising generalization capability tested on ASVspoof 2015.

License: GNU General Public License v3.0

Python 100.00%

tssdnet's Introduction

End-to-End Synthetic Speech Detection

Important Notice (Oct. 2021)

The results reported in our paper were based on Windows system, while we recently found that the execution of the same repo and dataset on Linux yielded different results, using the pretrained models:

  • Res-TSSDNet ASVspoof2019 eval EER: 1.6590%;
  • Inc-TSSDNet ASVspoof2019 eval EER: 4.0384%.

We have identified issues of the package soundfile on Windows when writing and reading flac files, but this problem does not exist on Linux for the same package. The similar problem has been pointed out here.

About

We present two light-weight neural network models, termed time-domain synthetic speech detection net (TSSDNet), having the classic ResNet and Inception Net style structures (Res-TSSDNet and Inc-TSSDNet), for end-to-end synthetic speech detection. They achieve the state-of-the-art performance in terms of equal error rate (EER) on ASVspoof 2019 challenge and are also shown to have promising generalization capability when tested on ASVspoof 2015.

Dataset

  • ASVspoof 2019 LA partition. link
  • ASVspoof 2015. link
  1. ASVspoof 2019 train set is used for training;
  2. ASVspoof 2019 dev set is used for model selection;
  3. ASVspoof 2019 eval set is used for testing;
  4. ASVspoof 2015 eval set is used for cross-dataset testing.

Model Architecture

Main Results

The two models with 1.64% and 4.04% eval EER (below), and their train logs, are provided in folder pretrained.

Fixing all hyperparameters, the distribution of the lowest dev (and the corresponding eval) EERs among 100 epochs, trained from scratch (below):

Usage

Data Preparation

ASVspoof15&19_LA_Data_Preparation.py

It generates

  1. equal-duration time domain raw waveform
  2. 2D log power of constant Q transform

from ASVspoof2019 and ASVspoof2015 official datasets, respectively. The calculation of CQT is adopted from Li et al. ICASSP 2021.

Training

train.py

It supports training using

  1. standard cross-entropy vs weighted cross-entropy
  2. standard train loader vs mixup regularization
  3. 1D raw waveforms vs 2D CQT feature
  4. ASVspoof 2019 training set vs ASVspoof 2015 training set

A train log will be generated, and trained models per epoch will be saved.

Testing

test.py

It generates softmax accuracy, ROC curve, and EER.

Citation Information

G. Hua, A. B. J. Teoh, and H. Zhang, “Towards end-to-end synthetic speech detection,” IEEE Signal Processing Letters, vol. 28, pp. 1265–1269, 2021. arXiv | IEEE Xplore

tssdnet's People

Contributors

ghua-ac avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.