Giter Site home page Giter Site logo

deepspeech.torch's Introduction

deepspeech.torch

Build Status Documentation Status

Implementation of Baidu Warp-CTC using torch7. Creates a network based on the DeepSpeech2 architecture using the Torch7 library, trained with the CTC activation function.

Features

  • Train large models with large datasets via online loading using LMDB and multi-GPU support.
  • Supports variable length batches via masking.
  • Implements the AN4 Audio database (50 mins of data). Has also been extended to train using the LibriSpeech dataset (1000 hours of data). Custom dataset preparation is explained in documentation.

Branches

There are currently two branches, Master and Phoneme:

  • Master: This branch trains DeepSpeech2. Also included is an evaluation script which calculates the WER/CER, as well as a prediction script. This branch is useful for understanding how the DeepSpeech and CTC works and is easy to run after installation. Highly recommended to checkout this branch.
  • Phonemes: This branch is experimental and uses phonemes rather than character based predictions. This is fully credited and extended by CCorfield and his awesome work in porting to use phonemes. In addition to this I'd like to also thank Shane Walker for his awesome recent conversion to use phonemes as well.

Performance

These results are based on training on the AN4 training set, and testing on the AN4 test set. Will be updated as architecture/datasets changes.

WER CER
14 4.22

Training graph

Installation/Data Preparation/Documentation

Follow Instructions/Data Preparation/Documentation found in the wiki here to set up and run the code.

Technical documentation can be found here.

Acknowledgements

Lots of people helped/contributed to this project that deserve recognition:

  • Soumith Chintala for his support on Torch7 and the vast open source projects he has contributed that made this project possible!
  • Charles Corfield for his work on the Phoneme Dataset and his overall contribution and aid throughout.
  • Will Frey for his thorough communication and aid in the development process.
  • Ding Ling, Yuan Yang and Yan Xia for their significant contribution to online training, multi-gpu support and many other important features.
  • Erich Elsen and the team from Baidu for their contribution of Warp-CTC that made this possible, and the encouraging words and support given throughout the project.

deepspeech.torch's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.