Giter Site home page Giter Site logo

keenresearch / keenasr-ios-poc Goto Github PK

View Code? Open in Web Editor NEW
71.0 16.0 36.0 288.92 MB

Proof of concept app that demonstrates use of KeenASR SDK in ObjC. WE ARE HIRING: https://keenresearch.com/careers.html

Objective-C 100.00%
speech speech-recognition voice-recognition voice-commands voice-control speech-to-text

keenasr-ios-poc's Introduction

WE ARE HIRING: https://keenresearch.com/careers.html

Note

This proof-of-concept app ships with a trial version of KeenASR framework, which will exit (crash) the app 15min after the framework has been initialized. If you would like to obtain a version of the framework without this limitation, contact us at [email protected].

By cloning this repository and downloading the trial KeenASR SDK or ASR Bundle you agree to the KeenASR SDK Trial Licensing Agreement

For more details about the SDK see: http://keenresearch.com/keenasr-docs

Important:

  • You will need git-lfs to checkout the project (if git-lfs is installed you can just clone the project)
  • You will need to clone the repository, Zip download WILL NOT WORK since we use git-lfs for large file management. After cloning the repository, you will need to set/change the bundle id for the app (currently set to com.keenresearch.com.keenasr-ios-poc), as well as signing settings in XCode project settings. These settings are under project build settings, General tab->Identity.

KeenASR Proof-of-Concept App

A proof-of-concept app that shows how to run KeenASR automatic speech recognition framework. For detailed information on all classes and methods, consult the SDK reference documentation. If starting with the framework from scratch, check our Quick Start document.

This demo app uses acoustic models in keenB2mQT-nnet3chain-en-us directory. Keen Research provides variety of custom acoustic models to its clients.

Six different demos are provided in this proof of concept app:

  1. Music library voice control: your music library will be loaded and song names and artist names will be used to create a custom decoding graph

  2. Contacts voice control: your contacts will be loaded and first/last name will be used to create a custom decoding graph

  3. Educational Reading Demo: demonstrates ASR use for following users reading aloud, by highlighting words as they are read. Oral reading rate of speech is computed in real time. Additional information related to oral reading fluency will be available in future releases.

  4. Educational Words Demo: demonstrates ASR use for recognizing individual words. A set of ~1000 most common words for children is used to create a decoding graph. User can say the word itself of "How do you spell <WORD>" or "Spell <WORD>" and the word will be displayed on the screen.

  5. Command and Control Demo: demonstrates how to use the framework for simple command and control app, for example, a robot control.

keenasr-ios-poc's People

Contributors

ognjentodic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keenasr-ios-poc's Issues

Problem with interruption of recording caused by phone call

How to restart listening after interruption caused by call ended?
I tried to

  1. stop listening when interruption begin and start it when interruption end.
  2. set Audio session to active
  3. I saw that sample rate is changed from 16000 to 44000 when interruption began so I setPreferredSampleRate to 16000
    This was not successful.
    I also tried to stop listening when interruption begin and reinit Kaldi but that was not successful as well.

Integrate with Swift

I use swift.
As I dropped the framework file in my project.
and build I have this issue:

Undefined symbols for architecture arm64:
"std::istream& std::istream::_M_extract(unsigned int&)", referenced from:
void kaldi::ReadBasicType(std::istream&, bool, unsigned int*) in KaldiIOS(build-tree-utils.o)

Is that an issue with the way I set up the project?
the the framework was not build for arm64 architecture?
Let know how to fix the problem

xcode 8 build project

Hi, I just cloned the demo project and tried to build and run on my phone in xcode 8 beta 4.
I get the following error message:

Undefined symbols for architecture arm64:
"_audio_open", referenced from:
_audio_stream_chunk in KaldiIOS(au_streaming.o)
"_audio_write", referenced from:
_audio_stream_chunk in KaldiIOS(au_streaming.o)
"_audio_close", referenced from:
_audio_stream_chunk in KaldiIOS(au_streaming.o)
"_play_wave", referenced from:
_flite_process_output in KaldiIOS(flite.o)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Might be related to the other issue?

Trying to build on iOS device...linker error

Get the following error when trying to build on iOS 12.1.4 device. Any tips?

ld: warning: ignoring file /Users/jeffbonasso/DevCode/AmbiFi/sandbox/keenasr/keenasr-ios-poc/KeenASR.framework/KeenASR, file was built for unsupported file format ( 0x76 0x65 0x72 0x73 0x69 0x6F 0x6E 0x20 0x68 0x74 0x74 0x70 0x73 0x3A 0x2F 0x2F ) which is not the architecture being linked (armv7): /Users/jeffbonasso/DevCode/AmbiFi/sandbox/keenasr/keenasr-ios-poc/KeenASR.framework/KeenASR
Undefined symbols for architecture armv7:
"OBJC_CLASS$_KIOSDecodingGraph", referenced from:
objc-class-ref in EduWordsDemoViewController.o
objc-class-ref in MusicDemoViewController.o
objc-class-ref in FileRecognitionDemoViewController.o
objc-class-ref in ContactsDemoViewController.o
objc-class-ref in EduReadingDemoViewController.o
objc-class-ref in CommandAndControlViewController.o
"OBJC_CLASS$_KIOSRecognizer", referenced from:
objc-class-ref in ViewController.o
objc-class-ref in EduWordsDemoViewController.o
objc-class-ref in MusicDemoViewController.o
objc-class-ref in FileRecognitionDemoViewController.o
objc-class-ref in ContactsDemoViewController.o
objc-class-ref in EduReadingDemoViewController.o
objc-class-ref in CommandAndControlViewController.o
...
ld: symbol(s) not found for architecture armv7
clang: error: linker command failed with exit code 1 (use -v to see invocation)

missing required architecture i386 and x86_64

Can you please compile the library to support i386 and x86_64.
It is not working with simulators in Intel Macs.
This library seems to be supporting only armv7 and arm64

$ lipo -info KaldiIOS
Architectures in the fat file: KaldiIOS are: armv7 arm64 

Unable to connect to the Dashboard

Unable to establish a connection to the Dashboard. Below is the code used to initialize the SDK, "appKey" has been replaced with before posting in here.

Nothing is being sent to the Dashboard and logs doesn't seem to contain anything related to KIOSUploader. Default ATS configurations are being used.

        KIOSRecognizer.setLogLevel(.debug)
        KIOSRecognizer.initWithASRBundle("keenB2mQT-nnet3chain-en-us")
        // Shared instance of KIOSRecognizer
        KIOSRecognizer.sharedInstance()!.createJSONMetadata = true
        KIOSRecognizer.sharedInstance()!.createAudioRecordings = true
        // KIOSUploader
        if KIOSUploader.createDataUploadThread(for: recognizer, usingAppKey: "<key>") == false {
            print("Recognizer is unable to establish connection to Dashboard")
        }

Not getting final results when calling stopListeningAndReturnFinalResult

Not getting anything from recognizerFinalResult:forRecognizer: when calling stopListeningAndReturnFinalResult. Recognizer state before calling stop is KIOSRecognizerStateListening.

Only thing that I changed was the VAD configurations as I needed recognizer to run for a longer period of time:

recognizer.setVADParameter(KIOSVadParameter.timeoutForNoSpeech, toValue: 60)
recognizer.setVADParameter(KIOSVadParameter.timeoutEndSilenceForAnyMatch, toValue: 60)
recognizer.setVADParameter(KIOSVadParameter.timeoutEndSilenceForGoodMatch, toValue: 60)
recognizer.setVADParameter(KIOSVadParameter.timeoutMaxDuration, toValue: 60)

Partial results missing additional info

I've created a simple demo where user has to speak out the alphabet. Recognition is good and precise, however I'm not receiving any additional information from KIOSWord, only text that was recognized. Is it possible to get at least confidence score of each recognized segment/word?

Partial result: overall conf: (null). Num words: 7. WORDS: [[A s:(null), dur: (null), conf: (null), tag: 0] [B s:(null), dur: (null), conf: (null), tag: 0] [C s:(null), dur: (null), conf: (null), tag: 0] [D s:(null), dur: (null), conf: (null), tag: 0] [F s:(null), dur: (null), conf: (null), tag: 0] [E s:(null), dur: (null), conf: (null), tag: 0] [<SPOKEN_NOISE> s:(null), dur: (null), conf: (null), tag: 1] ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.