mainro / deep-speaker Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 3.0 47 KB

An implementation of deep speaker from baidu

License: MIT License

Makefile 5.42% Python 94.58%

deep-speaker's People

Contributors

Stargazers

Watchers

Forkers

flavio58it manandharsudip4 wizyke

deep-speaker's Issues

How to run

Python version: 3.7

Description

Is it possible to take weights for deeps speaker from https://github.com/a-nagrani/VGGVox and load them into your code and run? And even if no, how can I train your model and test it after? I've looked through the deep-speaker/deep_speaker/train/ folder but I still can not get how to path dataset for training

Thanks in advance)

Implement resnet model

The model used on deep-speaker is ResNet. This is the first model that must be tested.

Implement Inception model

Test performance with Inception model.

Select MFCC library with python bindings

An MFCC library is needed to extract features. It should be implemented in rust, with python bindings via cbindgen and cffi. some crates are availabe:

we must test them to see if they can provide the same result than the default tensorflow one, and allow for more configuration.

Otherwise a new library can be written, preferably on top of an existing dct library. One is available here:

https://crates.io/crates/rustdct

If no serious option exists, then fftw can be used as a base of a new mfcc rust library:

http://fftw.org/

Implement reactive training code base

The training application must be implemented with cyclotron, so that is is functional and reactive. This should ease experimenting with multiple models and multiple settings. It should also allow some performance optimizations on the training.

Implement hard triplet loss

Hard triplet loss is needed. Tensorflow contains a contrib with a semi-hard triplet loss. This contrib code can be derived to implement hard triplet loss.

https://www.tensorflow.org/api_docs/python/tf/contrib/losses/metric_learning/triplet_semihard_loss

Implement cosine similarity

Cosine similarity is needed to compute triplet loss. This is the distance function used in deep-speaker.

Implement MobileNet model

Test performances with mobilenet v2 model.

Implement feature extraction

Implement feature extraction independently from the training part: Since this part takes some time, it should be done once so that multiple trainings can be done without computing them each time.

input is the voxceleb2 dataset.
output is a mfcc binary dataset (MFCC storage format to define.).
configuration parameters:

utterance duration (default value: 1s)
MFCC parameters (exhaustive list to define)

Utterances longer than the one used for features are split in several utterance. This allows to increase the dataset.

Notes:
No background sound is added and no volume adjustment is done because the voxceleb2 dataset already contains various backgrounds.

mainro / deep-speaker Goto Github PK

deep-speaker's People

Contributors

Stargazers

Watchers

Forkers

deep-speaker's Issues

How to run

Description

Implement resnet model

Implement Inception model

Select MFCC library with python bindings

Implement reactive training code base

Implement hard triplet loss

Implement cosine similarity

Implement MobileNet model

Implement feature extraction

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent