azure_stt

API Wrapper for the Microsoft Azure Speech Services Speech-to-text REST API 3.1 (Cognitive Services).

Installation

Add this line to your application's Gemfile:

gem 'azure_stt'

And then execute:

bundle

Or install it yourself as:

gem install azure_stt

Azure Speech-to-text Subscription key

To be able to use the gem, you must have a subscription key. You can generate one on your Azure account.

If you don't have an Azure account, you can create one for free on this page.
Once logged on your Azure portal, subscribe to Speech in Microsoft Cognitive Services.
You will find two subscription keys available in 'RESOURCE MANAGEMENT > Keys' ('KEY 1' and 'KEY 2').

Usage

Configuration

Two environment variables are used:

'REGION': the region of your subscription
'SUBSCRIPTION_KEY': the API key you can generate on your Azure account.

You can look at the file env.sample and change the values. If you do not want to use environment variables, you can configure the values like so:

AzureSTT.configure do |config|
  config.region = 'your_region'
  config.subscription_key = 'your_key'
end

Finally, the class AzureSTT::Session uses by the default the values from the configuration, but you can initialize the session with custom values:

session = AzureSTT::Session.new(region: 'your_region', subscription_key: 'your_key')

Start a transcription

require 'azure_stt'

properties = {
  "diarizationEnabled" => false,
  "wordLevelTimestampsEnabled" => false,
  "punctuationMode" => "DictatedAndAutomatic",
  "profanityFilterMode" => "Masked"
}

content_urls = [ 'https://path.com/audio.ogg', 'https://path.com/audio1.ogg']

session = AzureSTT::Session.new

transcription = session.create_transcription(
  content_urls: content_urls,
  properties: properties,
  locale: 'en-US',
  display_name: 'The name of the transcription')

# You can the retrieve the results of your transcription with the id
puts transcription.id
# Outputs 'your_transcription_id'

Get a transcription

require 'azure_stt'

session = AzureSTT::Session.new

transcription = session.get_transcription('your_transcription_id')

# Returns
# #<AzureSTT::Transcription id="d35a802d-70ae-4358-a35d-b5faa0c75457"
# # model="" properties={"diarizationEnabled"=>false,
# # "wordLevelTimestampsEnabled"=>false, "channels"=>[0, 1],
# # "punctuationMode"=>"DictatedAndAutomatic", "profanityFilterMode"=>"Masked",
# # "duration"=>"PT5M18S"}
# # links={"files"=>"https://uscentral.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/d35a802d-70ae-4358-a35d-b5faa0c75457/files"}
# # last_action_date_time=#<Date: 2020-05-31 ((2459366j,0s,0n),+0s,2299161j)> created_date_time=#<Date: 2020-05-31 ((2459366j,0s,0n),+0s,2299161j)>
# # status="Succeeded" locale="en-US" display_name="Transcription name" files=[]>

if transcription.succeeded?
  # You can then access to the text, for instance :
  result = transcription.results.first
  puts result.text
end

Delete a transcription

require 'azure_stt'

session = AzureSTT::Session.new

transcription = session.delete_transcription('your_transcription_id')

The API doesn't seem to send 404 errors when the id is unknown, but always send a 204 response. So the Session#delete_transcription returns true even when the transcription didn't exist.

Starting a transcription, fetching the results and deleting the transcription

require 'azure_stt'

session = AzureSTT::Session.new

properties = {
  "diarizationEnabled" => false,
  "wordLevelTimestampsEnabled" => false,
  "punctuationMode" => "DictatedAndAutomatic",
  "profanityFilterMode" => "Masked"
}

content_urls = [ 'https://path.com/audio.ogg' ]

session = AzureSTT::Session.new

transcription = session.create_transcription(
  content_urls: content_urls,
  properties: properties,
  locale: 'en-US',
  display_name: 'The name of the transcription')

id = transcription.id

while(!transcription.finished?) do
  sleep(30)
  transcription = session.get_transcription(id)
end

if(transcription.succeeded?)
  puts transcription.results.first.text
end

session.delete_transcription(id)

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/PerfectMemory/azure_stt. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Code of Conduct

Everyone interacting in the AzureStt project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Details about API errors are not given

Description

When encountering an error with the API, the exception should contain information about the error to help the user understand it.

Reproduction

Any API error will do. For instance, let us give an invalid subscription key.

begin
  session = AzureSTT::Session.new(region: 'eastus', subscription_key: 'invalid')
  session.get_transcriptions
rescue AzureSTT::ServiceError => e
  puts e.message
end

Current behavior

PermissionDenied (401)

Expected behavior

PermissionDenied (401): Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

Notes

We can enable the logging of the HTTP requests for debugging purposes with the following instruction. See this HTTParty example for details.

AzureSTT::Client.class_eval { logger ::Logger.new($stdout), :debug, :curl }

Re-running the snippet above now prints the body of the response.

D, [2023-04-05T11:47:18.863532 #490911] DEBUG -- : [HTTParty] [2023-04-05 11:47:18 +0200] > GET https://eastus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions
[HTTParty] [2023-04-05 11:47:18 +0200] > Headers: 
[HTTParty] [2023-04-05 11:47:18 +0200] > Ocp-Apim-Subscription-Key: invalid
[HTTParty] [2023-04-05 11:47:18 +0200] > Content-Type: application/json
[HTTParty] [2023-04-05 11:47:18 +0200] > 
[HTTParty] [2023-04-05 11:47:18 +0200] < HTTP/1.1 401
[HTTParty] [2023-04-05 11:47:18 +0200] < Content-length: 224
[HTTParty] [2023-04-05 11:47:18 +0200] < Content-type: application/json
[HTTParty] [2023-04-05 11:47:18 +0200] < Apim-request-id: da69ffd0-143f-480d-919f-17caa875031d
[HTTParty] [2023-04-05 11:47:18 +0200] < Date: Wed, 05 Apr 2023 09:47:18 GMT
[HTTParty] [2023-04-05 11:47:18 +0200] < Connection: close
[HTTParty] [2023-04-05 11:47:18 +0200] < 
{"error":{"code":"401","message":"Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource."}}
[HTTParty] [2023-04-05 11:47:18 +0200] <

We can see that the response contains a useful error message about the nature of the problem. This pattern is present in most (if not all) of the API routes, according to the reference.

perfectmemory / azure_stt Goto Github PK

azure_stt's Introduction

azure_stt

Installation

Azure Speech-to-text Subscription key

Usage

Configuration

Start a transcription

Get a transcription

Delete a transcription

Starting a transcription, fetching the results and deleting the transcription

Development

Contributing

Code of Conduct

azure_stt's People

Contributors

Stargazers

Watchers

Forkers

azure_stt's Issues

Description

Reproduction

Current behavior

Expected behavior

Notes

Recommend Projects

Recommend Topics

Recommend Org