Giter Site home page Giter Site logo

lqzai / real-time-transcription-simple Goto Github PK

View Code? Open in Web Editor NEW

This project forked from azure-samples/real-time-transcription-simple

0.0 0.0 0.0 7.35 MB

How to build a real-time transcription solution

License: MIT License

JavaScript 73.21% CSS 8.83% HTML 17.96%

real-time-transcription-simple's Introduction

Real Time Transcription - Simple

This repo contains a fully working web-based Real Time Transcription application, powered by Azure Speech to Text. You can deploy it to your Azure subscription and local PC in less than 20 minutes. You can then modify it for your specific needs.

It is part 1 of a series of repos on how to build real-time-transcription applications using Azure Speech to Text.

Once you deploy this application, you will have something like this:

webapprtt.mov

This post is the number 1 post of a series of posts, demonstrating Azure Speech to Text in real-time, in scenarios that are increasingly more complex. This scenario is the simplest of them all: real-time transcription of spoken work in English, using the standard Azure model.

The Technology Used

You will be using:

  • Azure Speech-to-Text
  • React.js with Javascript (with Bootstrap, to make things look good)
  • Devcontainers - You can use this repo on your PC/MAC or on Codespaces in Github. This makes it easy for you to develop without having to install Node, NPM, or the React modules you will need

By following the steps under Installation you wil be able to get started quickly

Architecture

The architecture is shown below:

Getting Started

Prerequisites

This repository is used to build the application on a personal computer or on GitHub Codespaces. It uses Devcontainers.

You need to have the following on your personal computer (PC, Mac or Linux):

NB -> Alternatively, you can run this application entirely in a codespace.

Installation

Azure Services

The first step is to enable the required Azure Cognitive Services. To do that:

  • Click the button bellow:

Deploy to Azure

Once you have the prerequisites met:

  • Clone the repository
  • Open it with Visual Studio Code.

You will get a screen like this:

  • Click Reopen in Container to open the Devcontainer. Node, yarn, etc. are installed on it, and that is all you need to build and run your webapp.

Run Webapp

Add your secrets

This only needs to be done once.

  • Create the .env file. On your Visual Studio code terminal, enter
cd webapp 
cp env-template .env
  • update the file, entering the credentials for your recently created Azure Cognitive Services:
REACT_APP_COG_SERVICE_KEY=<your congnitive service key>
REACT_APP_COG_SERVICE_LOCATION=<your cognitive service region>

Run the app

  • On the terminal enter:
cd webapp
yarn start

You will have a response like this on Visual Studio Code:

  • Click Open in Browser

And, after some time you will see the app on the browser:

  • click start and start speaking English.

It will start transcribing what you say:

  • Click stop to stop transcribing.

  • Click export to download the transcription.

Under the Hood

This solution demonstrates the use of Azure Speech to Text's Continuous Recognition, using the Javascript SDK.

It follows the instructions provided in this Azure Speech to Text - Use continuous recognition

The "magic" happens in this part of the code: /webapp/src/Components/Transcription.js

First, we define a speech configuration with code like this:

const speechConfig = sdk.SpeechConfig.fromSubscription("YourSpeechKey", "YourSpeechRegion");

In order to capture the computer microphone, we use media services. This is done when the website is first rendered, via this code:

const getMedia = async (constraints) => {
  let stream = null
  try {
    stream = await navigator.mediaDevices.getUserMedia(constraints)
    // stream is then passed to the recognizer
  } catch (err) {
    /* handle the error */
    alert(err)
    console.log(err)
  }
}

We define the audio configuration, to read the stream, using the stream defined above:

This is done with code like this:

// configure Azure STT to listen to an audio Stream
const audioConfig = AudioConfig.fromStreamInput(stream)

Then we put it all together in one recognizer:

const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);

And create callbacks for when continuous recognition is running:

recognizer.recognizing = (s, e) => {

  // uncomment to debug
  // console.log(`RECOGNIZING: Text=${e.result.text}`)
  setRecognisingText(e.result.text)
  textRef.current.scrollTop = textRef.current.scrollHeight
}

recognizer.recognized = (s, e) => {
  setRecognisingText("")
  if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {

    // uncomment to debug
    // console.log(`RECOGNIZED: Text=${e.result.text}`)

    setRecognisedText((recognisedText) => {
      if (recognisedText === '') {
        return `${e.result.text} `
      }
      else {
        return `${recognisedText}${e.result.text} `
      }
    })
    textRef.current.scrollTop = textRef.current.scrollHeight
  }
  else if (e.result.reason === sdk.ResultReason.NoMatch) {
    console.log("NOMATCH: Speech could not be recognized.")
  }
}

recognizer.canceled = (s, e) => {
  console.log(`CANCELED: Reason=${e.reason}`)

  if (e.reason === sdk.CancellationReason.Error) {
    console.log(`"CANCELED: ErrorCode=${e.errorCode}`)
    console.log(`"CANCELED: ErrorDetails=${e.errorDetails}`)
    console.log("CANCELED: Did you set the speech resource key and region values?")
  }
  recognizer.stopContinuousRecognitionAsync()
}

recognizer.sessionStopped = (s, e) => {
  console.log("\n    Session stopped event.")
  recognizer.stopContinuousRecognitionAsync()
}

Then, whenever we want the recognizer to run, we run:

recognizer.startContinuousRecognitionAsync()

And, to stop it we run:

recognizer.stopContinuousRecognitionAsync()

The entire code of this solution is here: /webapp/src/Components/Transcription.js

Conclusion

This is the quick and simple way to build a working real-time-transcription webapp using React and javascript. This webapp recognizes English only, and simply transcribes what was said. It uses the standard model, so, some industry or subject specific words may be missed.

Next, you will learn how to recognize other languages, and how to improve accuracy by providing extra vocabulary.

real-time-transcription-simple's People

Contributors

rensilvaau avatar microsoftopensource avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.