awesome-speech-recognition-papers

automatic speech recognition paper roadmap, including HMM, DNN, RNN, CNN, Seq2Seq, Attention

Introduction

Automatic Speech Recognition has been investigated for several decades, and speech recognition models are from HMM-GMM to deep neural networks today. It's very necessary to see the history of speech recognition by this awesome paper roadmap. I will cover papers from traditional models to nowadays popular models, not only acoustic models or ASR systems, but also many interesting language models.

Paper List

Automatic Speech Recognition

An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition(1982), S. E. LEVINSON et al. [pdf]
A Maximum Likelihood Approach to Continuous Speech Recognition(1983), LALIT R. BAHL et al. [pdf]
Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition(1986), Andrew K. Halberstadt. [pdf]
Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition(1986), Lalit R. Bahi et al. [pdf]
Speaker-independent phone recognition using hidden Markov models(1989), Kai-Fu Lee et al. [pdf]
Hidden Markov Models for Speech Recognition(1991), B. H. Juang et al. [pdf]
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)(1997), J.G. Fiscus. [pdf]
Framewise phoneme classification with bidirectional LSTM and other neural network architectures(2005), Alex Graves et al. [pdf]
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. [pdf]
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al. [pdf]
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. [pdf]
Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. [pdf]
Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. [pdf]
Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
Improving deep neural networks for LVCSR using rectified linear units and dropout(2013), George E. Dahl et al. [pdf]
Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training(2013), Yajie Miao et al. [pdf]
Improvements to deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
Machine Learning Paradigms for Speech Recognition: An Overview(2013), Li Deng et al. [pdf]
Recent advances in deep learning for speech research at Microsoft(2013), Li Deng et al. [pdf]
Speech recognition with deep recurrent neural networks(2013), Alex Graves et al. [pdf]
Convolutional deep maxout networks for phone recognition(2014), László Tóth et al. [pdf]
Convolutional Neural Networks for Speech Recognition(2014), Ossama Abdel-Hamid et al. [pdf]
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition(2014), László Tóth. [pdf]
Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. [pdf]
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. [pdf]
First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. [pdf]
Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. [pdf]
Robust CNN-based speech recognition with Gabor filter kernels(2014), Shuo-Yiin Chang et al. [pdf]
Stochastic pooling maxout networks for low-resource speech recognition(2014), Meng Cai et al. [pdf]
Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. [pdf]
Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. [pdf]
Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. [pdf]
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks(2015), Tara N. Sainath et al. [pdf]
Deep convolutional neural networks for acoustic modeling in low resource languages(2015), William Chan et al. [pdf]
Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition(2015), Chao Weng et al. [pdf]
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al. [pdf]
Listen, Attend and Spell(2015), William Chan et al. [pdf]
Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. [pdf]
Advances in All-Neural Speech Recognition(2016), Geoffrey Zweig et al. [pdf]
Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. [pdf]
End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. [pdf]
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. [pdf]
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. [pdf]
End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. [pdf]
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. [pdf]
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. [pdf]
Latent Sequence Decompositions(2016), William Chan et al. [pdf]
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks(2016), Tara N. Sainath et al. [pdf]
Segmental Recurrent Neural Networks for End-to-End Speech Recognition(2016), Liang Lu et al. [pdf]
Towards better decoding and language model integration in sequence to sequence models(2016), Jan Chorowski et al. [pdf]
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al. [pdf]
Very Deep Convolutional Networks for End-to-End Speech Recognition(2016), Yu Zhang et al. [pdf]
Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al. [pdf]
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al. [pdf]
WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]
An enhanced automatic speech recognition system for Arabic(2017), Mohamed Amine Menacer et al. [pdf]
A network of deep neural networks for distant speech recognition(2017), Mirco Ravanelli et al. [pdf]
An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems(2017), Hany Ahmed et al. [pdf]
Building DNN acoustic models for large vocabulary speech recognition(2017), Andrew L. Maas et al. [pdf]
Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al. [pdf]
English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al. [pdf]
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA(2017), Song Han et al. [pdf]
Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al. [pdf]
Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling(2017), Hairong Liu et al. [pdf]
Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al. [pdf]
Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al. [pdf]
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al. [pdf]
Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al. [pdf]
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition(2017), Jaeyoung Kim et al. [pdf]

Speech Synthesis

Deep Voice: Real-time Neural Text-to-Speech(2017), Sercan O. Arik et al. [pdf]
WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]

Language Modelling

Contact Me

If my repo is helpful to you, please give me a star and fork to encourage me to keep updating. Thank you.

For any questions, welcome to send email to :[email protected]. If you use wechat, you can follow me by searching wechat public media id:deeplearningdigest, I would push several articles every week to share my deep learning practices with you. Thanks!

kztao / awesome-speech-recognition-papers Goto Github PK