Introduction We can envision and consider client-side, server-side

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This issue pertains to <a href="https://w3c.github.io/speech-api/#issues-index" rel="n

Client-side, Server-side and Third-party Speech Recognition, Synthesis and Translation about speech-api HOT 5 OPEN

wicg commented on July 28, 2024

Client-side, Server-side and Third-party Speech Recognition, Synthesis and Translation

from speech-api.

Comments (5)

guest271314 commented on July 28, 2024

One important item to note here relevant to speech recognition is that Chrome/Chromium currently records the user and sends the recording to a server - without any notification provided that clearly indicates the user of the browser is being recorded and their voice is being sent to an external server. Also, it is not clear if the users' biometrics (their voice) is stored (forever) by the service; see https://bugs.chromium.org/p/chromium/issues/detail?id=816095

from speech-api.

AdamSobieski commented on July 28, 2024

@guest271314 , thank you. By including these client-side, server-side and third-party scenarios in the design of standard APIs and by more tightly integrating such standards and APIs with WebRTC we can: (1) provide users with notifications and permissions with respect to which client-side, server-side and third-party components and services are accessing their microphones and their text, SSML, hypertext and audio streams, (2) produce efficient call graphs (see also: https://youtu.be/EPBWR_GNY9U?t=2m from 2:00 to 4:12), (3) reduce latency for real-time translation scenarios, (4) improve quality for real-time translation scenarios.

from speech-api.

AdamSobieski commented on July 28, 2024

I’m hoping to inspire interest in post-text speech technology (speech-to-X₁ and X₂-to-speech) as well as interest in round-tripping where we can utilize acoustic measures and metrics to compare the audio input to and output from speech-to-X₁-to-X₂-to-speech.

X₁, X₂ could be SSML (1.0, 1.1 or 2.0), hypertext or new formats.

X₁-to-X₂ machine translation is also topical.

from speech-api.

AdamSobieski commented on July 28, 2024

In the video Real Time Translation in WebRTC, the speaker indicates (at 7:48) that a major issue which he would like to see solved is that users have to pause their speech before speech recognition and translation occur.

Towards reducing latency, we can consider real-time online speech recognition algorithms which, instead of processing natural language sentence-by-sentence and outputting X₁, process natural language lexeme-by-lexeme and produce event streams. In these low-latency approaches, speech recognition components and services process speech audio in real-time and produce event steams which are consumed by machine translation components which produce event streams which are consumed by speech synthesis components which produce resultant speech audio.

from speech-api.

AdamSobieski commented on July 28, 2024

This issue pertains to Issue 1 in the Web Speech API specification.

Issue 1: The group has discussed whether WebRTC might be used to specify selection of audio sources and remote recognizers. See Interacting with WebRTC, the Web Audio API and other external sources thread on [email protected].

from speech-api.

Client-side, Server-side and Third-party Speech Recognition, Synthesis and Translation about speech-api HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent