Giter Site home page Giter Site logo

kztao / awesome-speech-recognition-papers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zzw922cn/awesome-speech-recognition-speech-synthesis-papers

0.0 1.0 0.0 8 KB

automatic speech recognition paper roadmap, including HMM, DNN, RNN, CNN, Seq2Seq, Attention

License: MIT License

awesome-speech-recognition-papers's Introduction

awesome-speech-recognition-papers

automatic speech recognition paper roadmap, including HMM, DNN, RNN, CNN, Seq2Seq, Attention

Introduction

Automatic Speech Recognition has been investigated for several decades, and speech recognition models are from HMM-GMM to deep neural networks today. It's very necessary to see the history of speech recognition by this awesome paper roadmap. I will cover papers from traditional models to nowadays popular models, not only acoustic models or ASR systems, but also many interesting language models.

Paper List

Automatic Speech Recognition

  • An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition(1982), S. E. LEVINSON et al. [pdf]

  • A Maximum Likelihood Approach to Continuous Speech Recognition(1983), LALIT R. BAHL et al. [pdf]

  • Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition(1986), Andrew K. Halberstadt. [pdf]

  • Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition(1986), Lalit R. Bahi et al. [pdf]

  • Speaker-independent phone recognition using hidden Markov models(1989), Kai-Fu Lee et al. [pdf]

  • Hidden Markov Models for Speech Recognition(1991), B. H. Juang et al. [pdf]

  • A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)(1997), J.G. Fiscus. [pdf]

  • Framewise phoneme classification with bidirectional LSTM and other neural network architectures(2005), Alex Graves et al. [pdf]

  • Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. [pdf]

  • Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al. [pdf]

  • Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. [pdf]

  • Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. [pdf]

  • Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. [pdf]

  • Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]

  • Improving deep neural networks for LVCSR using rectified linear units and dropout(2013), George E. Dahl et al. [pdf]

  • Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training(2013), Yajie Miao et al. [pdf]

  • Improvements to deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]

  • Machine Learning Paradigms for Speech Recognition: An Overview(2013), Li Deng et al. [pdf]

  • Recent advances in deep learning for speech research at Microsoft(2013), Li Deng et al. [pdf]

  • Speech recognition with deep recurrent neural networks(2013), Alex Graves et al. [pdf]

  • Convolutional deep maxout networks for phone recognition(2014), László Tóth et al. [pdf]

  • Convolutional Neural Networks for Speech Recognition(2014), Ossama Abdel-Hamid et al. [pdf]

  • Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition(2014), László Tóth. [pdf]

  • Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. [pdf]

  • End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. [pdf]

  • First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. [pdf]

  • Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. [pdf]

  • Robust CNN-based speech recognition with Gabor filter kernels(2014), Shuo-Yiin Chang et al. [pdf]

  • Stochastic pooling maxout networks for low-resource speech recognition(2014), Meng Cai et al. [pdf]

  • Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. [pdf]

  • Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. [pdf]

  • Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. [pdf]

  • Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks(2015), Tara N. Sainath et al. [pdf]

  • Deep convolutional neural networks for acoustic modeling in low resource languages(2015), William Chan et al. [pdf]

  • Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition(2015), Chao Weng et al. [pdf]

  • Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al. [pdf]

  • Listen, Attend and Spell(2015), William Chan et al. [pdf]

  • Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. [pdf]

  • Advances in All-Neural Speech Recognition(2016), Geoffrey Zweig et al. [pdf]

  • Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. [pdf]

  • End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. [pdf]

  • Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. [pdf]

  • Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. [pdf]

  • End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. [pdf]

  • Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. [pdf]

  • Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. [pdf]

  • Latent Sequence Decompositions(2016), William Chan et al. [pdf]

  • Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks(2016), Tara N. Sainath et al. [pdf]

  • Segmental Recurrent Neural Networks for End-to-End Speech Recognition(2016), Liang Lu et al. [pdf]

  • Towards better decoding and language model integration in sequence to sequence models(2016), Jan Chorowski et al. [pdf]

  • Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al. [pdf]

  • Very Deep Convolutional Networks for End-to-End Speech Recognition(2016), Yu Zhang et al. [pdf]

  • Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al. [pdf]

  • Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al. [pdf]

  • WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]

  • An enhanced automatic speech recognition system for Arabic(2017), Mohamed Amine Menacer et al. [pdf]

  • A network of deep neural networks for distant speech recognition(2017), Mirco Ravanelli et al. [pdf]

  • An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems(2017), Hany Ahmed et al. [pdf]

  • Building DNN acoustic models for large vocabulary speech recognition(2017), Andrew L. Maas et al. [pdf]

  • Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al. [pdf]

  • English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al. [pdf]

  • ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA(2017), Song Han et al. [pdf]

  • Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al. [pdf]

  • Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling(2017), Hairong Liu et al. [pdf]

  • Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al. [pdf]

  • Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al. [pdf]

  • Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al. [pdf]

  • Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al. [pdf]

  • Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition(2017), Jaeyoung Kim et al. [pdf]

Speech Synthesis

  • Deep Voice: Real-time Neural Text-to-Speech(2017), Sercan O. Arik et al. [pdf]

  • WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]

Language Modelling

Contact Me

If my repo is helpful to you, please give me a star and fork to encourage me to keep updating. Thank you.

For any questions, welcome to send email to :[email protected]. If you use wechat, you can follow me by searching wechat public media id:deeplearningdigest, I would push several articles every week to share my deep learning practices with you. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.