Giter Site home page Giter Site logo

asr_syllable's Introduction

ASR_Syllable

=======================基于卷积神经网络的语音识别声学模型的研究========================

此项目是对自己研一与研二上之间对于DCNN-CTC学习总结,提出了MCNN-CTC以及Densenet-CTC声学模型,最终实验结果如下所示:

1) Thchs30_TrainingResults

Thchs30训练以及微调训练曲线

2) Thchs30_Results

Thchs30实验结果

3) Stcmds_Results

Stcmds实验结果

声学模型介绍

1) DCNN-CTC声学模型介绍

该模型主要是在speech_model-05上进行修改,上述模型主要使用DCNN-CTC构建语音识别声学模型,STcmds 数据集也是仿照该模型进行修改,最后实验结果如上图所示;

2) MCNN-CTC声学模型介绍

该模型主要是在speech_model_10 脚本上进行实验,最终实验结果可在上图2)所示结果,最终MCNN-CTC总体实验结果相较于DCNN-CTC较好;

3) DenseNet-CTC声学模型介绍

上述模型主要是在 DenseNet上进行实验,最终实验在Thchs30数据集结果可以达到接近30%左右的CER,具体实验可以自己付尝试一下;

4) Attention-CTC声学模型

此模型主要在DCNN-CTC基础上,在全连接层进行注意力操作,最终结果相较于其他结果相较于DCNN-CTC可能有提升,具体可以参看speech_model_06脚本;主要算法实验如下所示:
NN(Attention)-CTC:
# dense1 = Dense(units=512, activation='relu', use_bias=True, kernel_initializer='he_normal')(reshape)
# attention_prob = Dense(units=512, activation='softmax', name='attention_vec')(dense1)
# attention_mul = multiply([dense1, attention_prob])
#
# dense1 = BatchNormalization(epsilon=0.0002)(attention_mul)
# dense1 = Dropout(0.3)(dense1)

迁移学习

Retraining(重新训练)主要对初始模型进行进一步微调,可进一步提升初始模型的准确率,具体训练脚本可参看 train_modelSpeech 脚本,本文主要针对全部网路层进行微调,实验结果相较于初始模型可进一步提升,具体实验结果可参看图1)

论文引用

W Zhang, M H Zhai, Z L Huang, et al. Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks[C]. https://doi.org/10.1007/978-3-030-27529-7_29

参考项目连接

个人博客 包含自己近期的学习总结
参考链接
ASR_WORD以字为建模单元构建语音识别声学模型

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.