Giter Site home page Giter Site logo

ayushverma135 / whisper-hindi-asr-model-iit-bombay-internship Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 342 KB

The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced deep learning techniques to accurately transcribe spoken Hindi into text.

License: Eclipse Public License 2.0

Jupyter Notebook 100.00%
dataset deep-learning deep-neural-networks googlecollab machine-learning neural-networks python3 transcribing

whisper-hindi-asr-model-iit-bombay-internship's Introduction

Whisper-Hindi-ASR-model-IIT-Bombay-Internship

The Whisper Hindi ASR (Automatic Speech Recognition) model utilizes the KathBath dataset, a comprehensive collection of speech samples in Hindi. Trained on this dataset, Whisper employs advanced deep learning techniques to accurately transcribe spoken Hindi into text. Its architecture harnesses the power of neural networks to recognize and interpret the nuances of the Hindi language, including regional accents and dialects. Whisper stands out for its efficiency and accuracy, offering a reliable solution for converting spoken Hindi into written text across various applications, from transcription services to voice-enabled interfaces.

The KathBath dataset serves as a crucial foundation for training the Whisper model. This dataset encompasses diverse speech samples sourced from different regions, covering a wide range of topics and conversational styles. The richness and diversity of the data enable Whisper to generalize well across various real-world scenarios, ensuring robust performance in different contexts.

ASR models not only transcribe speech but also calculate the Word Error Rate (WER) by comparing their output to a reference transcription. This metric quantifies accuracy, with lower WER values indicating better performance.

Word Error Rate(WER)

  • Word Error Rate (WER) is a metric used to evaluate the performance of automatic speech recognition (ASR) systems. It measures the disparity between the transcribed output generated by the ASR system and the reference or ground truth transcription.
  • WER is calculated as the ratio of the total number of errors (insertions, deletions, and substitutions) required to align the transcribed text with the reference text, divided by the total number of words in the reference text.
  • WER provides a comprehensive assessment of the accuracy of ASR systems, taking into account both misrecognitions and omissions of words in the transcription. Lower WER values indicate higher accuracy, with a perfect score of 0 indicating an exact match between the transcribed and reference texts.
  • WER is a widely used evaluation metric in the field of speech recognition, providing valuable insights into the performance and quality of ASR models across different languages and applications.

Whisper Model

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

approach

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Setup

We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:

  pip install -U openai-whisper

Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

  pip install git+https://github.com/openai/whisper.git 

To update the package to the latest version of this repository, please run:

  pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

  #on Ubuntu or Debian
  sudo apt update && sudo apt install ffmpeg

  # on Arch Linux
  sudo pacman -S ffmpeg

  # on MacOS using Homebrew (https://brew.sh/)
  brew install ffmpeg

  # on Windows using Chocolatey (https://chocolatey.org/)
  choco install ffmpeg

  # on Windows using Scoop (https://scoop.sh/)
  scoop install ffmpeg

whisper-hindi-asr-model-iit-bombay-internship's People

Contributors

ayushverma135 avatar banner-19 avatar

Watchers

 avatar

whisper-hindi-asr-model-iit-bombay-internship's Issues

vistaar

!git clone https://github.com/ai4bharat-indicnlp/vistaar.git
%cd vistaar
!python vistaar_evaluation.py --transcribed_data path/to/transcribed_data.txt --reference_text path/to/reference_text.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.