Giter Site home page Giter Site logo

tinkoff / voicekit-examples Goto Github PK

View Code? Open in Web Editor NEW
55.0 22.0 32.0 2.92 MB

Examples on how to use Tinkoff Voicekit

Home Page: https://voicekit.tinkoff.ru/

License: Apache License 2.0

Python 25.40% Shell 2.07% JavaScript 3.08% Go 4.09% Ruby 1.98% C 2.05% Objective-C 3.07% Swift 3.74% C# 41.75% Java 12.77%
speech-recognition speech-synthesis python golang nodejs

voicekit-examples's Introduction

Tinkoff VoiceKit Examples

https://voicekit.tinkoff.ru

Usage

Clone this repo

$ git clone --recursive https://github.com/Tinkoff/voicekit-examples.git
$ cd voicekit-examples

Setup environment

Set VOICEKIT_API_KEY and VOICEKIT_SECRET_KEY environment variables to your API key and secret key to authenticate your requests to VoiceKit:

export VOICEKIT_API_KEY="Your API key"
export VOICEKIT_SECRET_KEY="Your secret key"

You may get scope tinkoff.cloud.tts is not supported error if your API key does not support speech synthesis. Write us a letter at https://voicekit.tinkoff.ru to enable speech synthesis for you API key.

Language specific instructions

Follow language specific instructions in the related folder in repository root. E.g. for Python scripts, open python/README.md. Here is a list of links to instructions for supported languages:

If you can't find your favorite language here, don't worry: consult gRPC docs for a list of its supported languages and when you are ready dive into Protobuf definitions inside apis/ folder.

Note on endpoint format

Use api.tinkoff.ai:443. Unencrypted endpoints (with port 80) are not available.

voicekit-examples's People

Contributors

a-n-d-r-e-w-y avatar akarimova avatar captainger avatar dchebakov avatar denis-trofimov avatar dilap54 avatar drwatsno avatar fsatka avatar g-e-okopnik avatar penguin138 avatar standy66 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voicekit-examples's Issues

Cannot infer VOICEKIT_API_KEY

Добрый день!
Ошибка:
recognize.py: error: Cannot infer VOICEKIT_API_KEY, pass via --api_key command line parameter or VOICEKIT_API_KEY environment variable
в чем может быть проблема?
сделано всё по инструкции
установлены export VOICEKIT_API_KEY="Your API key"
export VOICEKIT_SECRET_KEY="Your secret key"
затем уже вызвана команда python3 recognize.py -r 16000 -c 1 -e MPEG_AUDIO ../audio/sample_1.mp3

Не получается декодировать и прослушать audio_content

Привет, пытаюсь синтезировать речь и наткнулся на проблему связанную с декодирование речи.
Пробовал различные варианты:

  1. Копирую содержимое audio_content в текстовый файл
  2. Преобразую текстовый файл в wav при помощи base64 -d audio.txt > audio.wav
  3. Пробую возпроизвести аудио файл или проверить его при помощи soxi audio.wav
  4. Получаю ошибку soxi FAIL formats: can't open input file audio.wav': WAVE: RIFF header not found`

Payload:
{"input":{"text":"проверка"},"audioConfig":{"audioEncoding":"LINEAR16","sampleRateHertz":24000},"voice":{"name":"alyona:flirt"}}

Так же писал в форму обратной связи на офф. сайте voicekit.tinkoff.ru, но ответа не получил.
Быть может что-то делаю нет так. Аналогичный способ прекрасно работает для google-tts

Примеры для nodejs все еще является работоспособным?

При запуске node synthesize_stream.js -r 48000 -e LINEAR16 "Газета Times, 03 января 2009 года - Канцлер на грани ради второго спасения банков." output_3.wav

Я получаю следующею ошибку

/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:382
        throw Error("no such type: " + path);
        ^
Error: no such type: longrunning.v1.Operation
    at Service.lookupType (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:382:15)
    at Method.resolve (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/method.js:148:45)
    at Service.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/service.js:111:20)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Root.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Root.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/root.js:258:43)
    at Object.loadSync (/srv/voicekit-examples/nodejs/node_modules/@grpc/proto-loader/build/src/index.js:218:16)

Я попробовал этот пример запустить на Debian (nodejs12) а так же в Window10 (node14)

Limiting the running time of the script

Hello!

I am using recognize_stream.js.
Recognition stops working after 20 seconds. Why exactly 20 seconds? Is there any limitation?
Screen

I am using this command:
node recognize_stream.js -e MPEG_AUDIO -r 22050 -c 1 --interim-results --silence-duration-threshold 10 ../../binaryjs/recordings/Windows-10_1596526954048.mp3

--silence-duration-threshold 10 - this parameter does not work

Не работает stt_long_running_recognize_audio_group

Запрос на обработку отправляется, но статус операций не обновляется.

python stt_long_running_recognize_audio_group.py

WatchOperations. Initial state:
[104] ENQUEUED
[105] ENQUEUED
[106] ENQUEUED
[107] ENQUEUED
============================
WatchOperations. Init finished.

через некоторое довольно продолжительное время получается

Traceback (most recent call last):
  File "stt_long_running_recognize_audio_group.py", line 90, in <module>
    for response in responses:
  File ".../voicekit-examples/.venv/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File ".../voicekit-examples/.venv/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1649599453.934501536","description":"Error received from peer ipv4:91.194.226.157:443","file":"src/core/lib/surface/call.cc","file_line":905,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"

annotations.proto was not found

Добрый день!
Хотела запустить примеры, но проблемы возникли на этапе настройки.
После установки requirements.txt попыталась сгенерить protobuf командой ./sh/generate_protobuf.sh, но вылезла ошибка.
В чем может быть проблема?

google/api/annotations.proto: File not found.
apis/stt.proto:6:1:  Import "google/api/annotations.proto" was not found or had errors.

Поды для iOS приложения не инсталируются

Привет! Пытаюсь запустить ваш пример iOS приложения, но при pod install вылезает ошибка:

Could not make proto path relative: ../third_party/googleapis/google/api/annotations.proto: No such file or directory

Буду благодарна за помощь в решении этой проблемы!

Question: VAD

Доброго дня.

В stt.proto есть упоминание о Voice Activity Detection (VoiceActivityDetectionConfig)

Хочу попросить Вас написать пару слов об этом. Т.е. для чего именно тут служит VAD и каково его состояние по умолчанию. Он включен ? Он выключен ? Стоит ли его включать ? Описание параметров ?

Хоть пару слов. Спасибо.

warning: Import google

Hey, what could be the problem?

./sh/generate_protobuf.sh
apis/tts.proto:5:1: warning: Import google/protobuf/duration.proto but not used.

Php example is needed

It would be very useful to have an example of using the API in PHP.
This can save time for developers who are more familiar with this language.

Пример node.js voice kit не работает

После установок пакетов и выполнения любой команды из примеров выдает ошибок:

C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:411
        throw Error("no such Type or Enum '" + path + "' in " + this);
        ^

Error: no such Type or Enum 'google.rpc.Status' in Type .tinkoff.cloud.longrunning.v1.Operation
    at Type.lookupTypeOrEnum (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:411:15)
    at Field.resolve (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\field.js:268:94)
    at Type.set (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\type.js:177:38)
    at Type.get (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\type.js:155:45)
    at Field.resolve (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\field.js:317:21)
    at Type.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\type.js:304:21)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)

Пробовал запускать на MacOS 11.5.2 (Node.js 15.8.0) и Windows 10 (Node.js 16.4.2)

Question: Words list is empty

Добрый день!
Ожидал увидеть список слов с временем конца и начала ([]*WordInfo в SpeechRecognitionAlternative), но как оказалось список пуст. Правильно я понимаю что в текущей версии API нет такой возможности?

Использовал пример на golang.

Question: Audio format

If audio format:

Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

This file that I used for example and with RecognitionConfig config:

{ streaming_config:
   { config:
      { encoding: 'LINEAR16',
        sample_rate_hertz: 16000,
        language_code: 'ru-RU',
        max_alternatives: 4,
        num_channels: 1 } } }

I receive recognition responses.

Now convert this audio in another audio format:

Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Bit Rate       : 128k
Sample Encoding: 16-bit Signed Integer PCM

In proto file stt.proto in AudioEncoding there is ALAW. Set new RecognitionConfig config:

{ streaming_config:
   { config:
      { encoding: 'ALAW',
        sample_rate_hertz: 8000,
        language_code: 'ru-RU',
        max_alternatives: 4,
        num_channels: 1 } } }

But no recognition responses at all...

I tryied LINEAR16 8000 but same result...

What I`am doing wrong ?

NODEJS example

It will be nice if someone push here example for NODEJS.

Thanks.

Sample rate gets rounded to thousands

I'm trying to recognize an audio file with a sample rate of 22050 hz. I pass the correct value of the corresponding parameter: --rate 22050 but get an error which says that the sample rate is configured to 22000 hz which is not true.

#! /bin/bash

source "./sh/env.sh"
cat $1 | \
    python3 -m recognize_stream --host stt.tinkoff.ru --port 443 \
    --rate 22050 --num_channels 2 --encoding MPEG_AUDIO \
    --chunk_size 8192 --api_key $STT_TEST_API_KEY --secret_key $STT_TEST_SECRET_KEY

Audio header reports sample rate of 22050 hz, but recognition_config.sample_rate_herts = 22000

The same problem appears with the 44100 hz. Seems like the sample rate get rounded to thousands somewhere while being passed to the API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.