spokestack / spokestack-ios Goto Github PK

Spokestack: give your iOS app a voice interface!

License: Apache License 2.0

Objective-C 0.28% Swift 98.95% Ruby 0.77%

wakeword wakeword-activation asr voice-activity-detection text-to-speech vad natural-language-understanding speech-recognition speech-to-text speech-synthesis

spokestack-ios's Introduction

Spokestack provides an extensible speech recognition pipeline for the iOS platform. It includes a variety of built-in speech processors for Voice Activity Detection (VAD), wakeword activation, and Automatic Speech Recognition (ASR).

Features
Installation
Usage
Documentation
Reference
Deployment
License

Features

Voice activity detection
Wakeword activation with two different implementations
Simplified Automated Speech Recognition interface
Speech pipeline seamlessly integrates VAD-triggered wakeword detection using on-device machine learning models with transcribing utterances using platform Automated Speech Recognition
On-device Natural Language Understanding utterance classifier
Simple Text to Speech API

Installation

CocoaPods is a dependency manager for Cocoa projects. For usage and installation instructions, visit their website. To integrate Spokestack into your Xcode project using CocoaPods, specify it in your Podfile:

pod 'Spokestack-iOS'

Usage

Spokestack.io hosts extensive usage documentation including tutorials, integrations, and recipe how-tos.

Configure Wakeword-activated Automated Speech Recognition

import Spokestack
// assume that self implements the SpokestackDelegate protocol
let pipeline = SpeechPipelineBuilder()
    .addListener(self)
    .useProfile(.appleWakewordAppleSpeech)
    .setProperty("tracing", Trace.Level.PERF)
pipeline.start()

This example creates a speech recognition pipeline using a configurable wakeword detector that is triggered by VAD, which in turn activates an the native iOS ASR, returning the resulting utterance to the SpokestackDelegate observer (self in this example).

See SpeechPipeline and SpeechConfiguration for further configuration documentation.

Text to Speech

// assume that self implements the TextToSpeechDelegate protocol
let tts = TextToSpeech(self, configuration: SpeechConfiguration())
tts.speak(TextToSpeechInput("My god, it's full of stars!"))

Natural Language Understanding

let config = SpeechConfiguration()
config.nluVocabularyPath = "vocab.txt"
config.nluModelPath = "nlu.tflite"
config.nluModelMetadataPath = "metadata.json"
// assume that self implements the NLUDelegate protocol
let nlu = try! NLUTensorflow(self, configuration: configuration)
nlu.classify(utterance: "I can't turn that light in the room on for you, Dave", context: [:])

Troubleshooting

A build error similar to Code Sign error: No unexpired provisioning profiles found that contain any of the keychain's signing certificates will occur if the bundle identifier is not changed from io.Spokestack.SpokestackFrameworkExample, which is tied to the Spokestack organization.

Reference

The SpokestackFrameworkExample project is a reference implementations for how to use the Spokestack library, along with runnable examples of the VAD, wakeword, ASR, NLU, and TTS components. Each component has a button on the main screen, and can be started, stopped, predicted, or synthesized as appropriate. The component screens have full debug tracing enabled, so the system control logic and debug events will appear in the XCode console.

Documentation

Getting Started, Cookbooks, and Conceptual Guides

Step-by-step introduction, common usage patterns, and discussion of concepts used by the library, design guides for voice interfaces, and the Android library may all be found on our website.

API Reference

API reference is available on Github.

Deployment

Preconditions

Ensure that git lfs has been installed: https://git-lfs.github.com/. This is used to manage the storage of the large model and metadata files in SpokestackFrameworkExample.
Ensure that CocoaPods has been installed: gem install cocoapods (not via brew).
Ensure that you are registered in CocoaPods: pod trunk register YOUR_EMAIL --description='release YOUR_PODSPEC_VERSION'

Process

Increment the podspec version in Spokestack-iOS.podspec
pod lib lint --use-libraries --allow-warnings, which should pass all checks
git commit -a -m 'YOUR_COMMIT_MESSAGE' && git tag YOUR_PODSPEC_VERSION && git push --origin
pod trunk push --use-libraries --allow-warnings

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

spokestack-ios's People

Contributors

Stargazers

Watchers

Forkers

kwylez lymons xaphod rayyan808 kiennt2 ferasalsaab noelweichbrodt soltrinox

spokestack-ios's Issues

9.0.1 with iOS 12 support: AVAudioEngine errors on pipeline start on iOS 14

Trying to get spokestack up and running. iOS 14.2, iPad 11" pro, spokestack-ios installed from pod, verison 9.0.1 per Pod.lockfile.

I see these after calling start(), and nothing works.

2020-12-04 18:12:14.650501-0500 booth[6325:5456020] [aurioc] AURemoteIO.cpp:1095:Initialize: failed: -10851 (enable 1, outf< 2 ch,      0 Hz, Float32, non-inter> inf< 2 ch,      0 Hz, Float32, non-inter>)
2020-12-04 18:12:14.651563-0500 booth[6325:5456020] [aurioc] AURemoteIO.cpp:1095:Initialize: failed: -10851 (enable 1, outf< 2 ch,      0 Hz, Float32, non-inter> inf< 2 ch,      0 Hz, Float32, non-inter>)
2020-12-04 18:12:14.652462-0500 booth[6325:5456020] [aurioc] AURemoteIO.cpp:1095:Initialize: failed: -10851 (enable 1, outf< 2 ch,      0 Hz, Float32, non-inter> inf< 2 ch,      0 Hz, Float32, non-inter>)
2020-12-04 18:12:14.652534-0500 booth[6325:5456020] [avae]            AVAEInternal.h:109   [AVAudioEngineGraph.mm:1397:Initialize: (err = AUGraphParser::InitializeActiveNodesInInputChain(ThisGraph, *GetInputNode())): error -10851
2020-12-04 18:12:14.652564-0500 booth[6325:5456020] [avae]          AVAudioEngine.mm:167   Engine@0x283c0f890: could not initialize, error = -10851

AudioController memory build up

I see constant linear memory growth in XCode Memory.

This fixes it for me: in AudioController.swift, wrap contents of func recordingCallback with autoreleasepool { }

Pod Install not working

Error:

[!] Error installing Spokestack-iOS
[!] /usr/bin/git clone https://github.com/spokestack/spokestack-ios.git /var/folders/71/9tdn80h96cv0fgznfwlmsc0w0000gn/T/d20200815-21149-5hz3pc --template= --single-branch --depth 1 --branch 12.0.1

Cloning into '/var/folders/71/9tdn80h96cv0fgznfwlmsc0w0000gn/T/d20200815-21149-5hz3pc'...
Note: checking out '48c688301f8ee6966a011fa5cd267acc18a985e9'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

git-lfs filter-process: git-lfs: command not found
fatal: the remote end hung up unexpectedly
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Long pauses (hangs) on pipeline.stop()

Disclaimer: this is 9.0.1 (from May 6 2020) because I support iOS 12.

I'm occasionally seeing long pauses / hangs on calling .stop(), with errors shown like this one:
[aurioc] AURemoteIO.cpp:1639:Stop: AURemoteIO::Stop: error 268451843 calling TerminateOwnIOThread (port 104967)
This seems to happen on iOS 12 and iOS 14 (errors are different) with similar frequency.

The impact is minimized when I call stop off the main thread, ie. DispatchQueue.global().async { self.pipeline?.stop() }, because at least none of the UI work is waiting for it.

2 questions:

did this issue get solved in later releases somehow - any tips? I have already forked, am happy to develop fixes manually etc if I have a direction to go in
what's the recommended threading model for calling start/stop on the speech pipeline? Currently i'm calling start() from main thread, stop() off main thread. Not sure that's wise...

Many thanks, enjoying spoke stack so far (day 3)

SpeechEventListener not found / Doc discrepancies

Hello,

The spokestack documents for iOS seem to not be maintained. I found many issues whilst setting up a base project for a flutter plugin I am developing i.e the documents mentioned to use profile .tfLiteWakeword when in actuality it now exists as tfLiteWakewordKeyword. Apart from this, the main issue I am currently facing is that a component inheriting the protocol SpeechEventListener needs to be used, but no such protocol exists.

On a side note: I cannot find real documentation for spokestack and swift anywhere, just the 'documentation' provided on the website. Where may I be able to find proper documentation of library functions, methods, inheritance etc..

Return/Callback values for start/stop

Hello,
In my application, there is constant turning on and off of the SpeechPipeline, this is because I have integrated Azure TTS which must not play during an active pipeline as it causes non-deterministic results (ASR begins interpreting TTS, sometimes TTS activates before pipeline has released resources). I am aware of the didStop and didStart within SpokestackDelegate, but this is more of a hassle to use as I have many other instances in which start or stop may be invoked whilst navigating through the application. This means keeping track of each caller of the pipeline toggles as well as a global state, currently I am doing something similar with NotificationCenter but I feel an easier & cleaner approach would be to have a callback from the pipeline toggling.
It would be very useful for my use-case to be able to call something such as:

let resourceFlushed  = await pipeline.close();
switch(resourceFlushed) {
      case let .success():
      //invokeTTS here
      let resourceObtained = await pipeline.start();
            switch(resourceObtained) {
                case let .success():
                case let .failure(error):

        }
       case let .failure(error): 
        }

Pipeline failure due to: Failed to create the interpreter.

Hello,

I'm getting the error Pipeline failure due to: Failed to create the interpreter. when trying to run my application. The app configurations are correct as they work seamlessly on the Android codebase. Can I have an elaboration on this error?

AzureManager init
2021-08-20 13:09:03.865741+0200 Runner[10656:128311] Metal API Validation Enabled
2021-08-20 13:09:04.013476+0200 Runner[10656:128311] [plugin] AddInstanceForFactory: No factory registered for id <CFUUID 0x600003b21000> F8BB1C28-BAE8-11D6-9C31-00039315CD46
2021-08-20 13:09:04.500620+0200 Runner[10656:128311] Initialized TensorFlow Lite runtime.
Pipeline initialized.
2021-08-20 13:09:04.676435+0200 Runner[10656:128695] flutter: Observatory listening on http://127.0.0.1:53834/L4ASPbmCbtU=/
2021-08-20 13:09:04.690747+0200 Runner[10656:128311] Didn't find op for builtin opcode 'FULLY_CONNECTED' version '9'
2021-08-20 13:09:04.691000+0200 Runner[10656:128311] Registration failed.
Pipeline failure due to: Failed to create the interpreter.
2021-08-20 13:09:22.104875+0200 Runner[10656:128608] flutter: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
2021-08-20 13:09:22.105311+0200 Runner[10656:128608] flutter: │ #0   new FlutterSoundRecorder (package:flutter_sound/public/flutter_sound_recorder.dart:155:13)
2021-08-20 13:09:22.107288+0200 Runner[10656:128608] flutter: │ #1   new ExperimentController (package:elementa/app/modules/experiment/controllers/experiment_controller.dart:29:38)
2021-08-20 13:09:22.107543+0200 Runner[10656:128608] flutter: ├┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
2021-08-20 13:09:22.108070+0200 Runner[10656:128608] flutter: │ 🐛 ctor: FlutterSoundRecorder()
2021-08-20 13:09:22.108586+0200 Runner[10656:128608] flutter: └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
2021-08-20 13:09:22.668322+0200 Runner[10656:129100] [aurioc] AURemoteIO.h:323:entry: Unable to join I/O thread to workgroup ((null)): 2
iniSpokestack
Pipeline started.

It may be worth mentioning that in my Xcode debugger, I see the following error within my custom SpeechProcessor class:

extension NoteProcessor: SpeechProcessor {
    public var context: SpeechContext {
        get {
            return self.context >> ERROR: AURemoteIO::IOThread (45): EXC_BAD_ACCESS (code=2, address=0x70000aa6dff8)
            }
        set {
            self.context = newValue
            }
    }

Though I suspect this is simply due to the Pipeline never initialising, therefore the context is being sent to a deallocated Spokestack Delegate object?

EDIT:
On further investigation, it seems the error:
2021-08-20 13:09:04.690747+0200 Runner[10656:128311] Didn't find op for builtin opcode 'FULLY_CONNECTED' version '9' 2021-08-20 13:09:04.691000+0200 Runner[10656:128311] Registration failed.
is thrown due to an issue with TensorFlow models or perhaps the runtime version of TFLite. It is not mentioned to explicitily import the TensorFlowLiteSwift module in the Podfile, I'm assuming Spokestack adds the module itself so it shouldn't be necessary? I will try adding this and see if this resolves the issue. I

Flutter - Incompatible AudioSession category is set

Hello,

I've encountered an error whilst using my iPhone to test in contrast to the usual Emulator, it immediately threw:

Pipeline failure due to: Incompatible AudioSession category is set.

I have tried setting the category to all available options such as .record, .ambient etc. but it still fails. I also tried removing the setCategory altogether...still fails. I am using an iPhone 6s with iOS 13.3. All permissions are already given to the application.

The only code related to audio session is within application(_:didFinishLaunchingWithOptions. The category setup itself seems fine, and an error is not caught during the do try.

let audioSession: AVAudioSession = AVAudioSession.sharedInstance()
    let sessionCategory: AVAudioSession.Category = .record
            let sessionOptions: AVAudioSession.CategoryOptions = [.allowBluetoothA2DP, .allowAirPlay, .defaultToSpeaker]
            do {
                    try audioSession.setCategory(sessionCategory, mode: .default, options: sessionOptions)
                    try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
            } catch {
                print("AppDelegate application error when setting AudioSession category")
            }

I have reproduced an error AppDelegate application error when setting AudioSession category using the following:

Setup a new Flutter Project via Xcode
Use the setup tutorial provided here
Attempt to run the application via a plugged in iPhone

From this I found out that the error is reproduced only if you instantiate your AudioSession before the line:
GeneratedPluginRegistrant.register(with: self)
Flutter generates this line automatically in order to register the current instance with a 'Plugin Manager'. So, simply instantiate your AudioSession after that line.

Allow locale to be set by the developer

As of today the locale of the Apple Speech Recongizer is set by NSLocale.current as defined here:
https://github.com/spokestack/spokestack-ios/blob/68de84b03a0dd568e149f665e79457983d599957/Spokestack/AppleSpeechRecognizer.swift

This is restrictive without good reason. Allowing users to switch language within the app, regardless of the locale of the phone itself is a fairly common use case. Are there plans on the roadmap to fix this?

Thanks!

SpeechEventListener not found

Hello,

Example app confusion

Is there an explanation for what "SpokeStackFrameworkExample" app is supposed to be demonstrating? I see the four options on the initial landing page, and then start/stop recording buttons on each detail page. It asks for microphone access and sometimes speech access, but otherwise nothing seems to happen. There are some debug messages depending on whether I'm running iOS 12 or 13, but it's usually just "didStart" and "didStop".

spokestack / spokestack-ios Goto Github PK

spokestack-ios's Introduction

Table of Contents

Features

Installation

Usage

Configure Wakeword-activated Automated Speech Recognition

Text to Speech

Natural Language Understanding

Troubleshooting

Reference

Documentation

Getting Started, Cookbooks, and Conceptual Guides

API Reference

Deployment

Preconditions

Process

License

spokestack-ios's People

Contributors

Stargazers

Watchers

Forkers

spokestack-ios's Issues

Recommend Projects

Recommend Topics

Recommend Org