Giter Site home page Giter Site logo

pytorch-tdnn's Introduction

Pytorch-TDNN

This repository contains an implementation of a Time-Delay Neural Network which is commonly used in Speech Recognition and Neural Language Models.

Description

The model here implements a generalized version of the TDNN based on the model descriptions given in [1],[2].

In the description given by Waibel et al. a TDNN uses the full context specified. For example: if the delay specified is N = 2, the model uses the current frame and frames at delays of 1 and 2.

In the description given by Peddinti et al. the TDNN only looks at the farthest frames from the current frame as specified by the context parameter. The description in section 3.1 of the paper discusses the differences between their implementation and Waibel et al.

The TDNN implemented here allows for the usage of an arbitrary context which shall be demonstrated in the usage code snippet.

Usage

# For the model specified in the Waibel et al. paper, the first layer is as follows:
context = [0,2]
input_dim = 16
output_dim = 8
net = TDNN(context, input_dim, output_dim, full_context=True)

# For the model specified in the Peddinti et al. paper, the second layer is as follows (taking arbitrary I/O dimensions since it's not specified):
context = [-1,2]
input_dim = 16
output_dim = 8
net = TDNN(context, input_dim, output_dim, full_context=False)

# You may also use any arbitrary context like this:
context = [-11,0,5,7,10]
nput_dim = 16
output_dim = 8
net = TDNN(context, input_dim, output_dim, full_context=False)

# The above will convole the kernel with the current frame, 11 frames in the past, 5, 7, and 10 frames in the future.

output = net(input) # this will run a forward pass

Note that no zero-padding is done at any steps. If window-centering, etc are important for your task, make sure the input is correctly zero-padded and the context are correctly specified.

Acknowledgement

The code spun-off from this repository: pytorch_TDNN, which implements the TDNN or Character-Level CNN whose output is fed into the Char-LSTM Neural Language Model[3].

References

  1. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE transactions on acoustics, speech, and signal processing, 37(3), 328-339.
  2. Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts. In INTERSPEECH (pp. 3214-3218).
  3. Kim, Y., Jernite, Y., Sontag, D., & Rush, A. M. (2016). Character-aware neural language models. In Thirtieth AAAI Conference on Artificial Intelligence (pp. 2741-2749)..

pytorch-tdnn's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.