Giter Site home page Giter Site logo

deep-speaker's People

Contributors

mainro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deep-speaker's Issues

How to run

  • Python version: 3.7

Description

Is it possible to take weights for deeps speaker from https://github.com/a-nagrani/VGGVox and load them into your code and run? And even if no, how can I train your model and test it after? I've looked through the deep-speaker/deep_speaker/train/ folder but I still can not get how to path dataset for training

Thanks in advance)

Implement resnet model

The model used on deep-speaker is ResNet. This is the first model that must be tested.

Select MFCC library with python bindings

An MFCC library is needed to extract features. It should be implemented in rust, with python bindings via cbindgen and cffi. some crates are availabe:

we must test them to see if they can provide the same result than the default tensorflow one, and allow for more configuration.

Otherwise a new library can be written, preferably on top of an existing dct library. One is available here:

If no serious option exists, then fftw can be used as a base of a new mfcc rust library:

Implement reactive training code base

The training application must be implemented with cyclotron, so that is is functional and reactive. This should ease experimenting with multiple models and multiple settings. It should also allow some performance optimizations on the training.

Implement feature extraction

Implement feature extraction independently from the training part: Since this part takes some time, it should be done once so that multiple trainings can be done without computing them each time.

input is the voxceleb2 dataset.
output is a mfcc binary dataset (MFCC storage format to define.).
configuration parameters:

  • utterance duration (default value: 1s)
  • MFCC parameters (exhaustive list to define)

Utterances longer than the one used for features are split in several utterance. This allows to increase the dataset.

Notes:
No background sound is added and no volume adjustment is done because the voxceleb2 dataset already contains various backgrounds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.