Giter Site home page Giter Site logo

untemps / react-vocal Goto Github PK

View Code? Open in Web Editor NEW
19.0 3.0 3.0 2.51 MB

React component and hook to initiate a SpeechRecognition session

Home Page: https://untemps.github.io/react-vocal

License: MIT License

JavaScript 97.92% HTML 2.08%
speech-to-text speechrecognition speech web-speech-api react reactjs component hook javascript

react-vocal's Introduction

react-vocal

A React component and hook to initiate a SpeechRecognition session


npm GitHub Workflow Status Codecov

Links

๐Ÿ”ดย LIVE DEMOย :red_circle:

Disclaimer

The Web Speech API is only supported by few browsers so far (see caniuse). If the API is not available, the Vocal component won't display anything.

This component intends to catch a speech result as soon as possible. This can be a good fit for vocal commands or search field filling. For now on it does not support continuous speech (see Roadmap below).
That means either a result is caught and returned or timeout is reached and the recognition is discarded.
The stop function returned by children-as-function mechanism allows to prematurely discard the recognition before timeout elapses.

Special cases

Some browsers supports the SpeechRecognition API but not all the related APIs.
For example, browsers on iOS 14.5, the SpeechGrammar and SpeechGrammarList and Permissions APIs are not supported.

Although the lack of SpeechGrammar and SpeechGrammarList is handled by the underlaying @untemps/vocal library, you need to deal with Permissions by yourself.

Installation

yarn add @untemps/react-vocal

Usage

Vocal component

Basic usage

import Vocal from '@untemps/react-vocal'

const App = () => {
	const [result, setResult] = useState('')

	const _onVocalStart = () => {
		setResult('')
	}

	const _onVocalResult = (result) => {
		setResult(result)
	}

	return (
		<div className="App">
			<span style={{ position: 'relative' }}>
				<Vocal
					onStart={_onVocalStart}
					onResult={_onVocalResult}
					style={{ width: 16, position: 'absolute', right: 10, top: -2 }}
				/>
				<input defaultValue={result} style={{ width: 300, height: 40 }} />
			</span>
		</div>
	)
}

Custom component

By default, Vocal displays an icon with two states:

  • Idle
    Idle state
  • Listening
    Listening state

But you can provide your own component.

  • With a simple React element:
import Vocal from '@untemps/react-vocal'

const App = () => {
	return (
		<Vocal>
			<button>Start</button>
		</Vocal>
	)
}

In this case, a onClick handler is automatically attached to the component to start a recognition session.
Only the first direct descendant of Vocal will receive the onClick handler. If you want to use a more complex hierarchy, use the function syntax below.

  • With a function that returns a React element:
import Vocal from '@untemps/react-vocal'

const Play = () => (
	<div
		style={{
			width: 0,
			height: 0,
			marginLeft: 1,
			borderStyle: 'solid',
			borderWidth: '4px 0 4px 8px',
			borderColor: 'transparent transparent transparent black',
		}}
	/>
)

const Stop = () => (
	<div
		style={{
			width: 8,
			height: 8,
			backgroundColor: 'black',
		}}
	/>
)

const App = () => {
	return (
		<Vocal>
			{(start, stop, isStarted) => (
				<button style={{ padding: 5 }} onClick={isStarted ? stop : start}>
					{isStarted ? <Stop /> : <Play />}
				</button>
			)}
		</Vocal>
	)
}

The following parameters are passed to the function:

Arguments Type Description
start func The function used to start the recognition
stop func The function used to stop the recognition
isStarted bool A flag that indicates whether the recognition is started or not

Commands

The Vocal component accepts a commands prop to map special recognition results to callbacks.
That means you can define vocal commands to trigger specific functions.

const App = () => {
  return (
    <Vocal commands={{
      'switch border color': () => setBorderColor('red'),
    }}/>
  )
}

commands object is a key/pair model where the key is the command to be caught by the recognition and the value is the callback triggered when the command is detected.

key is not case sensitive.

const commands = {
    submit: () => submitForm(),
    'Change the background color': () => setBackgroundColor('red'), 
    'PLAY MUSIC': play
}

The component utilizes a special hook called useCommands to respond to the commands.
The hook performs a fuzzy search to match approximate commands if needed. This allows to fix accidental typos or approximate recognition results.
To do so the hook uses fuse.js which implements an algorithm to find strings that are approximately equal to a given input. The score precision that distinguishes acceptable command-to-callback mapping from negative matching can be customized in the hook instantiantion.

useCommands(commands, threshold) // threshold is the limit not to exceed to be considered a match

See fuze.js scoring theory for more details.

โš ๏ธ The Vocal component doesn't expose that score yet. For now on you have to deal with the default value (0.4)


Vocal component API

Props Type Default Description
commands object null Callbacks to be triggered when specified commands are detected by the recognition
lang string 'en-US' Language understood by the recognition BCP 47 language tag
grammars SpeechGrammarList null Grammars understood by the recognition JSpeech Grammar Format
timeout number 3000 Time in ms to wait before discarding the recognition
style object null Styles of the root element if className is not specified
className string null Class of the root element
onStart func null Handler called when the recognition starts
onEnd func null Handler called when the recognition ends
onSpeechStart func null Handler called when the speech starts
onSpeechEnd func null Handler called when the speech ends
onResult func null Handler called when a result is recognized
onError func null Handler called when an error occurs
onNoMatch func null Handler called when no result can be recognized

useVocal hook

Basic usage

import React, { useState } from 'react'
import { useVocal } from '@untemps/react-vocal'
import Icon from './Icon'

const App = () => {
	const [isListening, setIsListening] = useState(false)
	const [result, setResult] = useState('')

	const [, { start, subscribe }] = useVocal('fr_FR')

	const _onButtonClick = () => {
		setIsListening(true)

		subscribe('speechstart', _onVocalStart)
		subscribe('result', _onVocalResult)
		subscribe('error', _onVocalError)
		start()
	}

	const _onVocalStart = () => {
		setResult('')
	}

	const _onVocalResult = (result) => {
		setIsListening(false)

		setResult(result)
	}

	const _onVocalError = (e) => {
		console.error(e)
	}

	return (
		<div>
			<span style={{ position: 'relative' }}>
				<div
					role="button"
					aria-label="Vocal"
					tabIndex={0}
					style={{ width: 16, position: 'absolute', right: 10, top: 2 }}
					onClick={_onButtonClick}
				>
					<Icon color={isListening ? 'red' : 'blue'} />
				</div>
				<input defaultValue={result} style={{ width: 300, height: 40 }} />
			</span>
		</div>
	)
}

Signature

useVocal(lang, grammars)
Args Type Default Description
lang string 'en-US' Language understood by the recognition BCP 47 language tag
grammars SpeechGrammarList null Grammars understood by the recognition JSpeech Grammar Format

Return value

const [ref, { start, stop, abort, subscribe, unsubscribe, clean }]
Args Type Description
ref Ref React ref to the SpeechRecognitionWrapper instance
start func Function to start the recognition
stop func Function to stop the recognition
abort func Function to abort the recognition
subscribe func Function to subscribe to recognition events
unsubscribe func Function to unsubscribe to recognition events
clean func Function to clean subscription to recognition events

Browser support flag

Basic usage

import Vocal, { isSupported } from '@untemps/react-vocal'

const App = () => {
	return isSupported ? <Vocal /> : <p>Your browser does not support Web Speech API</p>
}

Events

Events Description
audioend Fired when the user agent has finished capturing audio for recognition
audiostart Fired when the user agent has started to capture audio for recognition
end Fired when the recognition service has disconnected
error Fired when a recognition error occurs
nomatch Fired when the recognition service returns a final result with no significant recognition
result Fired when the recognition service returns a result
soundend Fired when any sound โ€” recognisable or not โ€” has stopped being detected
soundstart Fired when any sound โ€” recognisable or not โ€” has been detected
speechend Fired when speech recognized by the recognition service has stopped being detected
speechstart Fired when sound recognized by the recognition service as speech has been detected
start fired when the recognition service has begun listening to incoming audio

Notes

The process to grant microphone access permissions is automatically managed by the hook (internally used by the Vocal component).

Development

The component can be served for development purpose on http://localhost:10001/ using:

yarn dev

Contributing

Contributions are warmly welcomed:

  • Fork the repository
  • Create a feature branch (preferred name convention: [feature type]_[imperative verb]-[description of the feature])
  • Develop the feature AND write the tests (or write the tests AND develop the feature)
  • Commit your changes using Angular Git Commit Guidelines
  • Submit a Pull Request

Roadmap

  • Add a connector management to plug external speech-to-text services in
  • Support continuous speech

react-vocal's People

Contributors

dependabot[bot] avatar semantic-release-bot avatar sidnioulz avatar untemps avatar vincentljn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

react-vocal's Issues

Handle continuous sessions

TODO: Add specifications


Please pay attention to Sidnioulz's comments below:

Shouldn't this be memoized? What would be the performance impact if I ran, say, 5 commands in a row?

Originally posted by @Sidnioulz in #58 (comment)

Correct me if I'm wrong, but I should be able to completely change the commands in use without suspending the speech recognition session, right?
I'm thinking of assisted typing use cases where I want to provide matching commands based on the current field / table cell.

Originally posted by @Sidnioulz in #58 (comment)

The returned result isn't the most confident one, and isn't complete if interimResults / continuous are used

Current situation

  • The result event returns only the first alternative of the result
  • If the result is split (continuous mode or interimResults mode), only the first bit is sent

Proposed change

  • All result bits are aggregated
  • For each result, the most confident match is used instead of the first (occasionally, the first match is the second-most confident instead of most confident)

I think the performance cost is negligible when continuous and interimResults are false (we just loop over a small ~10 items array), but this makes the wrapper useful in more cases including the one I need to support in my UI library.

Add section in README to explain how to handle partial supports

On iOS 14.5, SpeechRecognition API is supported but not all related APIs, like SpeechGrammarList or navigator.persmissions.
The lack of SpeechGrammarList is handled by @untemps/vocal but navigator.persmissions is not.
We should add a section to explain the developer should check for this particular case by himself.

Inconsistency in event signatures between result and other events makes it hard to wrap multiple events together

Current situation

All event signatures but result give the SpeechRecognition event as a first parameter to any installed handler. result, on the other hand, returns a partial result as a first parameter, and the event as a second parameter.

This makes it hard to remember the API, and hard to use a general event handler for all events (like we do in my UI toolkit so we can debounce all events from the wrapper).

Proposed change

I'd like to propose that all events returned by the SpeechRecognitionWrapper have the underlying event as the first parameter, since it's the richest source of information and the most useful for low-level use of the API.

Note that I don't mind that the Vocal component returns the computed result first.

Export isSupported getter

A isSupported is used internally to detect if the Web Speech API is available.
For some use cases, it can be useful to import it directly from the package.

Replace util function by @untemps/utils

The component uses internal util function to check if a value is a function.
The @untemps/utils package contains such a function. We can replace the internal one by the one included in the external package.

Make the use of commands optional

Fuse.js increases your bundle size by > 50% judging on its unpacked size and your current package's unpacked size. That's quite big. I think it (and therefore commands) should be optional. You can set it as a peer dependency with a peerDependenciesMeta section to achieve that, should you want to.

Originally posted by @Sidnioulz in #58 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.