Giter Site home page Giter Site logo

magicalfish / 1d-triplet-cnn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from iprobe-lab/1d-triplet-cnn

0.0 0.0 0.0 4.53 MB

PyTorch implementation of the 1D-Triplet-CNN neural network model described in Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals by A. Chowdhury, and A. Ross.

Home Page: https://ieeexplore.ieee.org/document/8839817

License: MIT License

Python 66.80% MATLAB 33.20%

1d-triplet-cnn's Introduction

1D-Triplet-CNN

PyTorch implementation of the 1D-Triplet-CNN neural network model described in Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals by A. Chowdhury, and A. Ross.

Research Article

Anurag Chowdhury, and Arun Ross, Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals, IEEE Transactions on Information Forensics and Security (2019).

1D-Triplet-CNN Model

1D-Triplet-CNN Details

Implementation details and requirements

The model was implemented in PyTorch 1.2.1 using Python 3.6 and may be compatible with different versions of PyTorch and Python, but it has not been tested.

Additional requirements are listed in the ./requirements.txt file.

Usage

Source code and model parameters

The source code of the 1D-Triplet-CNN model can be found in the model subdirectory, and a pre-trained model is available in the trained_models subdirectory.

Dataset

The pre-trained model avilable in the trained_models subdirectory was trained on a subset of Fisher speech corpus obtained from https://catalog.ldc.upenn.edu/LDC2004S13. The training data was also degraded with varying degrees of Babble noise obtained from NOISEX-92 dataset.

Training the 1D-Triplet-CNN model

In order to train a 1D-Triplet-CNN model as described in the research paper, use the 1D-Triplet-CNN implementation given in the models subdirectory. The network attains optimal performance when trained using a triplet learning framework. Read the research paper for more details on training the model.

Testing with the pretrained model

Recommended audio specifications

Usually, 2 seconds of speech audio sampled at 8000KHz is enough to produce reliable speaker recognition results. Longer audio samples will make the recognition task significantly slower with no significant benefits to performance. Audio samples smaller than 1secs with have considerable performance loss.

Usage

  1. Satisfy the requirements listed in the ./requirements.txt file.
  2. Run src/extractFeatures.m in MATLAB R2019a(or newer) to extract MFCC-LPC features from audio files placed in sample_audio subdirectory and save corresponding features as individual .mat files in sample_feature subdirectory.
  3. Run src/test.py in Python 3.6 to evaluate some sample audio pairs for generating speaker verification scores.

Examples

Some usage examples might be added in future.

1d-triplet-cnn's People

Contributors

chowdhuryanurag avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.