Giter Site home page Giter Site logo

perfectmemory / azure_stt Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 232 KB

API Wrapper for the Microsoft Azure Speech Services Speech-to-text REST API 3.1 (Cognitive Services).

Home Page: https://perfectmemory.github.io/azure_stt/

License: MIT License

Ruby 99.82% Shell 0.18%
azure ruby speech-to-text

azure_stt's Introduction

azure_stt

Gem Version CI Coverage Status Maintainability

API Wrapper for the Microsoft Azure Speech Services Speech-to-text REST API 3.1 (Cognitive Services).

Installation

Add this line to your application's Gemfile:

gem 'azure_stt'

And then execute:

bundle

Or install it yourself as:

gem install azure_stt

Azure Speech-to-text Subscription key

To be able to use the gem, you must have a subscription key. You can generate one on your Azure account.

  • If you don't have an Azure account, you can create one for free on this page.
  • Once logged on your Azure portal, subscribe to Speech in Microsoft Cognitive Services.
  • You will find two subscription keys available in 'RESOURCE MANAGEMENT > Keys' ('KEY 1' and 'KEY 2').

Usage

Configuration

Two environment variables are used:

  • 'REGION': the region of your subscription

  • 'SUBSCRIPTION_KEY': the API key you can generate on your Azure account.

You can look at the file env.sample and change the values. If you do not want to use environment variables, you can configure the values like so:

AzureSTT.configure do |config|
  config.region = 'your_region'
  config.subscription_key = 'your_key'
end

Finally, the class AzureSTT::Session uses by the default the values from the configuration, but you can initialize the session with custom values:

session = AzureSTT::Session.new(region: 'your_region', subscription_key: 'your_key')

Start a transcription

require 'azure_stt'

properties = {
  "diarizationEnabled" => false,
  "wordLevelTimestampsEnabled" => false,
  "punctuationMode" => "DictatedAndAutomatic",
  "profanityFilterMode" => "Masked"
}

content_urls = [ 'https://path.com/audio.ogg', 'https://path.com/audio1.ogg']

session = AzureSTT::Session.new

transcription = session.create_transcription(
  content_urls: content_urls,
  properties: properties,
  locale: 'en-US',
  display_name: 'The name of the transcription')

# You can the retrieve the results of your transcription with the id
puts transcription.id
# Outputs 'your_transcription_id'

Get a transcription

require 'azure_stt'

session = AzureSTT::Session.new

transcription = session.get_transcription('your_transcription_id')

# Returns
# #<AzureSTT::Transcription id="d35a802d-70ae-4358-a35d-b5faa0c75457"
# # model="" properties={"diarizationEnabled"=>false,
# # "wordLevelTimestampsEnabled"=>false, "channels"=>[0, 1],
# # "punctuationMode"=>"DictatedAndAutomatic", "profanityFilterMode"=>"Masked",
# # "duration"=>"PT5M18S"}
# # links={"files"=>"https://uscentral.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/d35a802d-70ae-4358-a35d-b5faa0c75457/files"}
# # last_action_date_time=#<Date: 2020-05-31 ((2459366j,0s,0n),+0s,2299161j)> created_date_time=#<Date: 2020-05-31 ((2459366j,0s,0n),+0s,2299161j)>
# # status="Succeeded" locale="en-US" display_name="Transcription name" files=[]>

if transcription.succeeded?
  # You can then access to the text, for instance :
  result = transcription.results.first
  puts result.text
end

Delete a transcription

require 'azure_stt'

session = AzureSTT::Session.new

transcription = session.delete_transcription('your_transcription_id')

The API doesn't seem to send 404 errors when the id is unknown, but always send a 204 response. So the Session#delete_transcription returns true even when the transcription didn't exist.

Starting a transcription, fetching the results and deleting the transcription

require 'azure_stt'

session = AzureSTT::Session.new

properties = {
  "diarizationEnabled" => false,
  "wordLevelTimestampsEnabled" => false,
  "punctuationMode" => "DictatedAndAutomatic",
  "profanityFilterMode" => "Masked"
}

content_urls = [ 'https://path.com/audio.ogg' ]

session = AzureSTT::Session.new

transcription = session.create_transcription(
  content_urls: content_urls,
  properties: properties,
  locale: 'en-US',
  display_name: 'The name of the transcription')

id = transcription.id

while(!transcription.finished?) do
  sleep(30)
  transcription = session.get_transcription(id)
end

if(transcription.succeeded?)
  puts transcription.results.first.text
end

session.delete_transcription(id)

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/PerfectMemory/azure_stt. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Code of Conduct

Everyone interacting in the AzureStt project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

azure_stt's People

Contributors

dependabot[bot] avatar frpouly avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

themire

azure_stt's Issues

Details about API errors are not given

Description

When encountering an error with the API, the exception should contain information about the error to help the user understand it.

Reproduction

Any API error will do. For instance, let us give an invalid subscription key.

begin
  session = AzureSTT::Session.new(region: 'eastus', subscription_key: 'invalid')
  session.get_transcriptions
rescue AzureSTT::ServiceError => e
  puts e.message
end

Current behavior

PermissionDenied (401)

Expected behavior

PermissionDenied (401): Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

Notes

We can enable the logging of the HTTP requests for debugging purposes with the following instruction. See this HTTParty example for details.

AzureSTT::Client.class_eval { logger ::Logger.new($stdout), :debug, :curl }

Re-running the snippet above now prints the body of the response.

D, [2023-04-05T11:47:18.863532 #490911] DEBUG -- : [HTTParty] [2023-04-05 11:47:18 +0200] > GET https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions
[HTTParty] [2023-04-05 11:47:18 +0200] > Headers: 
[HTTParty] [2023-04-05 11:47:18 +0200] > Ocp-Apim-Subscription-Key: invalid
[HTTParty] [2023-04-05 11:47:18 +0200] > Content-Type: application/json
[HTTParty] [2023-04-05 11:47:18 +0200] > 
[HTTParty] [2023-04-05 11:47:18 +0200] < HTTP/1.1 401
[HTTParty] [2023-04-05 11:47:18 +0200] < Content-length: 224
[HTTParty] [2023-04-05 11:47:18 +0200] < Content-type: application/json
[HTTParty] [2023-04-05 11:47:18 +0200] < Apim-request-id: da69ffd0-143f-480d-919f-17caa875031d
[HTTParty] [2023-04-05 11:47:18 +0200] < Date: Wed, 05 Apr 2023 09:47:18 GMT
[HTTParty] [2023-04-05 11:47:18 +0200] < Connection: close
[HTTParty] [2023-04-05 11:47:18 +0200] < 
{"error":{"code":"401","message":"Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource."}}
[HTTParty] [2023-04-05 11:47:18 +0200] < 

We can see that the response contains a useful error message about the nature of the problem. This pattern is present in most (if not all) of the API routes, according to the reference.

Use Speech-to-text REST API v3.1

At the moment, the Gem is using Speech-to-text REST API v3.0.

However, according to this page, the version 3.0 will be retired.

It is therefore important to release a new version of the Gem that uses the new version of the API

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.