tinkoff / voicekit-examples Goto Github PK

Examples on how to use Tinkoff Voicekit

License: Apache License 2.0

Python 25.40% Shell 2.07% JavaScript 3.08% Go 4.09% Ruby 1.98% C 2.05% Objective-C 3.07% Swift 3.74% C# 41.75% Java 12.77%

speech-recognition speech-synthesis python golang nodejs

voicekit-examples's Introduction

Tinkoff VoiceKit Examples

https://voicekit.tinkoff.ru

Usage

Clone this repo

$ git clone --recursive https://github.com/Tinkoff/voicekit-examples.git
$ cd voicekit-examples

Setup environment

Set VOICEKIT_API_KEY and VOICEKIT_SECRET_KEY environment variables to your API key and secret key to authenticate your requests to VoiceKit:

export VOICEKIT_API_KEY="Your API key"
export VOICEKIT_SECRET_KEY="Your secret key"

You may get scope tinkoff.cloud.tts is not supported error if your API key does not support speech synthesis. Write us a letter at https://voicekit.tinkoff.ru to enable speech synthesis for you API key.

Language specific instructions

Follow language specific instructions in the related folder in repository root. E.g. for Python scripts, open python/README.md. Here is a list of links to instructions for supported languages:

If you can't find your favorite language here, don't worry: consult gRPC docs for a list of its supported languages and when you are ready dive into Protobuf definitions inside apis/ folder.

Note on endpoint format

Use api.tinkoff.ai:443. Unencrypted endpoints (with port 80) are not available.

voicekit-examples's People

Contributors

Stargazers

Watchers

voicekit-examples's Issues

Cannot infer VOICEKIT_API_KEY

Добрый день!
Ошибка:
recognize.py: error: Cannot infer VOICEKIT_API_KEY, pass via --api_key command line parameter or VOICEKIT_API_KEY environment variable
в чем может быть проблема?
сделано всё по инструкции
установлены export VOICEKIT_API_KEY="Your API key"
export VOICEKIT_SECRET_KEY="Your secret key"
затем уже вызвана команда python3 recognize.py -r 16000 -c 1 -e MPEG_AUDIO ../audio/sample_1.mp3

Требуемая версия Python

В python/README.md указана требуемая версия python >= 3.5

Однако в python/common.py используется декоратор cached_property, который появился только в версии 3.8.

Question: Text-To-Speech (TTS)

Is there a possibility of speech generation (TTS) ? Any examples ?

Не получается декодировать и прослушать audio_content

Привет, пытаюсь синтезировать речь и наткнулся на проблему связанную с декодирование речи.
Пробовал различные варианты:

Копирую содержимое audio_content в текстовый файл
Преобразую текстовый файл в wav при помощи base64 -d audio.txt > audio.wav
Пробую возпроизвести аудио файл или проверить его при помощи soxi audio.wav
Получаю ошибку soxi FAIL formats: can't open input file audio.wav': WAVE: RIFF header not found`

Payload:
{"input":{"text":"проверка"},"audioConfig":{"audioEncoding":"LINEAR16","sampleRateHertz":24000},"voice":{"name":"alyona:flirt"}}

Так же писал в форму обратной связи на офф. сайте voicekit.tinkoff.ru, но ответа не получил.
Быть может что-то делаю нет так. Аналогичный способ прекрасно работает для google-tts

Fix install issues in docs and meta.

Describe Clone this repo better.
requirements.txt

Missed dependency in go version

Dependency "github.com/Tinkoff/voicekit-examples/golang/pkg/tinkoff/cloud/longrunning/v1" not found in repo.
file: https://github.com/Tinkoff/voicekit-examples/blob/master/golang/pkg/tinkoff/cloud/stt/v1/stt.pb.go

Do you plan to fix it?

Feature request: RecognitionConfig param partial_results

It will be good to add to RecognitionConfig new param: partial_results (bool)

If true than voicekit will send partial results of recognition and not only end phase.

Примеры для nodejs все еще является работоспособным?

При запуске node synthesize_stream.js -r 48000 -e LINEAR16 "Газета Times, 03 января 2009 года - Канцлер на грани ради второго спасения банков." output_3.wav

Я получаю следующею ошибку

/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:382
        throw Error("no such type: " + path);
        ^
Error: no such type: longrunning.v1.Operation
    at Service.lookupType (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:382:15)
    at Method.resolve (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/method.js:148:45)
    at Service.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/service.js:111:20)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Namespace.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Root.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/namespace.js:307:25)
    at Root.resolveAll (/srv/voicekit-examples/nodejs/node_modules/protobufjs/src/root.js:258:43)
    at Object.loadSync (/srv/voicekit-examples/nodejs/node_modules/@grpc/proto-loader/build/src/index.js:218:16)

Я попробовал этот пример запустить на Debian (nodejs12) а так же в Window10 (node14)

STT: confidence and stability not normalized to 0...1

what units of measurement for results?

TTS: Documentation on voice configuration

I was unable to change the gender of the voice in both node.js and C# API examples.
Is there any documentation on available TTS voices and language codes?

Limiting the running time of the script

Hello!

I am using recognize_stream.js.
Recognition stops working after 20 seconds. Why exactly 20 seconds? Is there any limitation?
Screen

I am using this command:
node recognize_stream.js -e MPEG_AUDIO -r 22050 -c 1 --interim-results --silence-duration-threshold 10 ../../binaryjs/recordings/Windows-10_1596526954048.mp3

--silence-duration-threshold 10 - this parameter does not work

Не работает stt_long_running_recognize_audio_group

Запрос на обработку отправляется, но статус операций не обновляется.

python stt_long_running_recognize_audio_group.py

WatchOperations. Initial state:
[104] ENQUEUED
[105] ENQUEUED
[106] ENQUEUED
[107] ENQUEUED
============================
WatchOperations. Init finished.

через некоторое довольно продолжительное время получается

Traceback (most recent call last):
  File "stt_long_running_recognize_audio_group.py", line 90, in <module>
    for response in responses:
  File ".../voicekit-examples/.venv/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File ".../voicekit-examples/.venv/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Received RST_STREAM with error code 0"
        debug_error_string = "{"created":"@1649599453.934501536","description":"Error received from peer ipv4:91.194.226.157:443","file":"src/core/lib/surface/call.cc","file_line":905,"grpc_message":"Received RST_STREAM with error code 0","grpc_status":13}"

annotations.proto was not found

Добрый день!
Хотела запустить примеры, но проблемы возникли на этапе настройки.
После установки requirements.txt попыталась сгенерить protobuf командой ./sh/generate_protobuf.sh, но вылезла ошибка.
В чем может быть проблема?

google/api/annotations.proto: File not found.
apis/stt.proto:6:1:  Import "google/api/annotations.proto" was not found or had errors.

Поды для iOS приложения не инсталируются

Привет! Пытаюсь запустить ваш пример iOS приложения, но при pod install вылезает ошибка:

Could not make proto path relative: ../third_party/googleapis/google/api/annotations.proto: No such file or directory

Буду благодарна за помощь в решении этой проблемы!

Question: VAD

Доброго дня.

В stt.proto есть упоминание о Voice Activity Detection (VoiceActivityDetectionConfig)

Хочу попросить Вас написать пару слов об этом. Т.е. для чего именно тут служит VAD и каково его состояние по умолчанию. Он включен ? Он выключен ? Стоит ли его включать ? Описание параметров ?

Хоть пару слов. Спасибо.

warning: Import google

Hey, what could be the problem?

./sh/generate_protobuf.sh
apis/tts.proto:5:1: warning: Import google/protobuf/duration.proto but not used.

Php example is needed

It would be very useful to have an example of using the API in PHP.
This can save time for developers who are more familiar with this language.

Пример node.js voice kit не работает

После установок пакетов и выполнения любой команды из примеров выдает ошибок:

C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:411
        throw Error("no such Type or Enum '" + path + "' in " + this);
        ^

Error: no such Type or Enum 'google.rpc.Status' in Type .tinkoff.cloud.longrunning.v1.Operation
    at Type.lookupTypeOrEnum (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:411:15)
    at Field.resolve (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\field.js:268:94)
    at Type.set (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\type.js:177:38)
    at Type.get (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\type.js:155:45)
    at Field.resolve (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\field.js:317:21)
    at Type.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\type.js:304:21)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)
    at Namespace.resolveAll (C:\Projects\voicekit-examples\nodejs\node_modules\protobufjs\src\namespace.js:308:25)

Пробовал запускать на MacOS 11.5.2 (Node.js 15.8.0) и Windows 10 (Node.js 16.4.2)

Question: Words list is empty

Добрый день!
Ожидал увидеть список слов с временем конца и начала ([]*WordInfo в SpeechRecognitionAlternative), но как оказалось список пуст. Правильно я понимаю что в текущей версии API нет такой возможности?

Использовал пример на golang.

Question: Audio format

If audio format:

Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

This file that I used for example and with RecognitionConfig config:

{ streaming_config:
   { config:
      { encoding: 'LINEAR16',
        sample_rate_hertz: 16000,
        language_code: 'ru-RU',
        max_alternatives: 4,
        num_channels: 1 } } }

I receive recognition responses.

Now convert this audio in another audio format:

Channels       : 1
Sample Rate    : 8000
Precision      : 16-bit
Bit Rate       : 128k
Sample Encoding: 16-bit Signed Integer PCM

In proto file stt.proto in AudioEncoding there is ALAW. Set new RecognitionConfig config:

{ streaming_config:
   { config:
      { encoding: 'ALAW',
        sample_rate_hertz: 8000,
        language_code: 'ru-RU',
        max_alternatives: 4,
        num_channels: 1 } } }

But no recognition responses at all...

I tryied LINEAR16 8000 but same result...

What I`am doing wrong ?

NODEJS example

It will be nice if someone push here example for NODEJS.

Thanks.

Sample rate gets rounded to thousands

I'm trying to recognize an audio file with a sample rate of 22050 hz. I pass the correct value of the corresponding parameter: --rate 22050 but get an error which says that the sample rate is configured to 22000 hz which is not true.

#! /bin/bash

source "./sh/env.sh"
cat $1 | \
    python3 -m recognize_stream --host stt.tinkoff.ru --port 443 \
    --rate 22050 --num_channels 2 --encoding MPEG_AUDIO \
    --chunk_size 8192 --api_key $STT_TEST_API_KEY --secret_key $STT_TEST_SECRET_KEY

Audio header reports sample rate of 22050 hz, but recognition_config.sample_rate_herts = 22000

The same problem appears with the 44100 hz. Seems like the sample rate get rounded to thousands somewhere while being passed to the API.

How to turn on the female voice in synthesis?

Good afternoon!

I use text-to-speech via Node js. Constantly generated synthesis by the male voice. Can you please tell us how to switch to a female voice?