Giter Site home page Giter Site logo

albinjm / finspeech Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 192 KB

A Speech Recognition Framework for Banking Interactions using Convolutional Recurrent Dense Neural Networks and Language Models

License: MIT License

Jupyter Notebook 100.00%
automatic-speech-recognition banking convolutional-neural-networks customer-service data-preprocessing deep-learning dense-neural-networks hugging-face language-modeling performance-evaluation

finspeech's Introduction

Enhanced Speech Recognition for Banking Dialogues Using CRDNN and Language Models

GitHub license

Overview

In the financial sector, understanding customer needs accurately and efficiently is paramount. This project aims to advance speech recognition capabilities within banking dialogues by employing a Convolutional, Recurrent, and Dense Neural Network (CRDNN) alongside Language Model (LM) support. Utilizing the HarperValleyBank (HVB) corpus, we delve into creating a speech recognition system that accurately transcribes banking interactions, paving the way for improved customer service automation and analytics in the banking industry.

Goals

The primary goal is to elevate the precision and performance of automatic speech recognition (ASR) systems tailored for banking dialogues. By optimizing a CRDNN model with the HVB dataset and integrating language models for context, project strive to develop an ASR system that can faithfully transcribe spoken banking dialogues, enhancing customer service experiences and operational efficiencies.

Data Source

  • HarperValleyBank (HVB) Corpus: A specialized spoken dialog corpus for banking, featuring annotated banking interactions. For further information, consult the HVB Corpus Documentation.

Approach

  1. CRDNN Model Optimization: Fine-tuning a pre-existing CRDNN model from SpeechBrain with the HVB dataset to better accommodate banking-specific dialogues.
  2. Language Model Application: Enhancing transcription accuracy by incorporating language models that offer banking-specific contextual understanding.
  3. Performance Assessment: Evaluating the optimized CRDNN model, both with and without LM enhancement, using a test segment of the HVB corpus to determine transcription effectiveness.

Technology Stack

  • SpeechBrain: A comprehensive, PyTorch-powered speech toolkit facilitating pre-trained ASR model access and deployment.
  • Hugging Face: Used for accessing and implementing the SpeechBrain pre-trained CRDNN model.
  • Python Ecosystem: Utilizes torchaudio, torch, json, among other libraries for efficient data management and model training workflows.

Getting Started

Installation

Follow these steps to prepare your environment and obtain all necessary resources:

# Prepare the HVB dataset
gdown 1oJh0U3g_bUx6UPX4xix2UHMVHeCE_H1y
gdown 1_OXiLOL2RBsbdCb4WyQsLudYxzJxMDJr
unzip -q hvb.zip
mv content/data /content/
rm -r /content/content

# Retrieve configuration files for training and inference
gdown 1a0EGlsLbXnGn1xwZoSqT0tcdAQ1L2nfd  # train.py
gdown 1yCmjRbxXRxfEN5LXdnE1Zpl8ZOIzdrAO  # train.yaml
gdown 1KHmdcLVFI9ontvGmi5J6vfaropGYuKcr  # inference.yaml

# Install SpeechBrain
pip install speechbrain -q

Refer to the included Python scripts and configuration files for comprehensive training and evaluation instructions.

Acknowledgments

My sincere appreciation goes to the SpeechBrain toolkit creators and the HVB corpus maintainers for their contributions to the public domain, supporting ongoing advancements in speech recognition technology.

finspeech's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.