Giter Site home page Giter Site logo

sunpengfei1122 / am-sincnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from joaoantoniocn/am-sincnet

0.0 0.0 0.0 204 KB

The Additive Margin SincNet (AM-SincNet) is a new approach for speaker recognition problems which is based in the neural network architecture SincNet and the additive margin softmax (AM-Softmax) loss function. It uses the architecture of the SincNet, but with an improved AM-Softmax layer.

Python 100.00%

am-sincnet's Introduction

Additive Margin SincNet (AM-SincNet)

AM-SincNet is a new approach for speaker recognition problems which is based in the neural network architecture SincNet and the additive margin softmax (AM-Softmax) loss function. It uses the architecture of the SincNet, but with an improved AM-Softmax layer.

This repository releases an example of code to perform a speaker recognition experiment on the TIMIT dataset. To run it with other datasets you can have a look at the instructions on the original SincNet repository (https://github.com/mravanelli/SincNet).

We should thank @mravanelli for the SincNet implementation.

Requirements

For running this experiment we used a Linux environment with Python 3.6.

You can see a list of python dependencies at requirements.txt.

To install it on conda virtual environment (conda install --file requirements.txt).

To install it on pip virtual environment (pip install -r requirements.txt).

How to Run

To run it on TIMIT dataset we have first to pre-process the data, removing the start and ending silences moments and also normalizing the audio sentences.

python TIMIT_preparation.py $TIMIT_FOLDER $OUTPUT_FOLDER data_lists/TIMIT_all.scp

where:

  • $TIMIT_FOLDER is the folder of the original TIMIT corpus
  • $OUTPUT_FOLDER is the folder in which the normalized TIMIT will be stored
  • data_lists/TIMIT_all.scp is the list of the TIMIT files used for training/test the speaker id system.

then, we can run the experiment itself by typing.

python speaker_id.py --cfg=cfg/$CFG_FILE

where:

  • $CFG_FILE is the name of the cfg configuration file which is located at cfg folder.

We have made avaliable several cfg configuration files for the experiments, if you want to run the experiment with the traditional SincNet (with no use of the improved AM-Softmax layer) you must use the SincNet_TIMIT.cfg file, otherwise you can use the SincNet_TIMIT_m0XX.cfg file where the XX denotes the size of the margin parameter that will be used for the AM-Softmax layer.

Results

When training have a look at the cfg configuration file, the output paths for the model and the result (res.res) files are placed there.

We have also made available some results from our experiments, you can check them at exp folder. The resume of the results are saved in the res.res files.

How to use SincNet with a different dataset?

In this repository, we used the TIMIT dataset as a tutorial to show how SincNet works. With the current version of the code, you can easily use a different corpus. To do it you should provide in input the corpora-specific input files (in wav format) and your own labels. You should thus modify the paths into the *.scp files you find in the data_lists folder.

To assign to each sentence the right label, you also have to modify the dictionary "TIMIT_labels.npy". The labels are specified within a python dictionary that contains sentence ids as keys (e.g., "si1027") and speaker_ids as values. Each speaker_id is an integer, ranging from 0 to N_spks-1. In the TIMIT dataset, you can easily retrieve the speaker id from the path (e.g., train/dr1/fcjf0/si1027.wav is the sentence_id "si1027" uttered by the speaker "fcjf0"). For other datasets, you should be able to retrieve in such a way this dictionary containing pairs of speakers and sentence ids.

You should then modify the config file (cfg/SincNet_TIMIT.cfg) according to your new paths. Remember also to change the field "class_lay=462" according to the number of speakers N_spks you have in your dataset.

Cite us

If you use this code or part of it, please cite us!

@INPROCEEDINGS{8852112,
author={J. A. {Chagas Nunes} and D. {Macêdo} and C. {Zanchettin}},
booktitle={2019 International Joint Conference on Neural Networks (IJCNN)},
title={Additive Margin SincNet for Speaker Recognition},
year={2019},
volume={},
number={},
pages={1-5},
keywords={},
doi={10.1109/IJCNN.2019.8852112},
ISSN={},
month={July},}

You can also find the paper at IEEE or the preprint at arXiv.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.