Giter Site home page Giter Site logo

Comments (5)

guest271314 avatar guest271314 commented on July 28, 2024

One important item to note here relevant to speech recognition is that Chrome/Chromium currently records the user and sends the recording to a server - without any notification provided that clearly indicates the user of the browser is being recorded and their voice is being sent to an external server. Also, it is not clear if the users' biometrics (their voice) is stored (forever) by the service; see https://bugs.chromium.org/p/chromium/issues/detail?id=816095

from speech-api.

AdamSobieski avatar AdamSobieski commented on July 28, 2024

@guest271314 , thank you. By including these client-side, server-side and third-party scenarios in the design of standard APIs and by more tightly integrating such standards and APIs with WebRTC we can: (1) provide users with notifications and permissions with respect to which client-side, server-side and third-party components and services are accessing their microphones and their text, SSML, hypertext and audio streams, (2) produce efficient call graphs (see also: https://youtu.be/EPBWR_GNY9U?t=2m from 2:00 to 4:12), (3) reduce latency for real-time translation scenarios, (4) improve quality for real-time translation scenarios.

from speech-api.

AdamSobieski avatar AdamSobieski commented on July 28, 2024

Iā€™m hoping to inspire interest in post-text speech technology (speech-to-X1 and X2-to-speech) as well as interest in round-tripping where we can utilize acoustic measures and metrics to compare the audio input to and output from speech-to-X1-to-X2-to-speech.

X1, X2 could be SSML (1.0, 1.1 or 2.0), hypertext or new formats.

X1-to-X2 machine translation is also topical.

from speech-api.

AdamSobieski avatar AdamSobieski commented on July 28, 2024

In the video Real Time Translation in WebRTC, the speaker indicates (at 7:48) that a major issue which he would like to see solved is that users have to pause their speech before speech recognition and translation occur.

Towards reducing latency, we can consider real-time online speech recognition algorithms which, instead of processing natural language sentence-by-sentence and outputting X1, process natural language lexeme-by-lexeme and produce event streams. In these low-latency approaches, speech recognition components and services process speech audio in real-time and produce event steams which are consumed by machine translation components which produce event streams which are consumed by speech synthesis components which produce resultant speech audio.

from speech-api.

AdamSobieski avatar AdamSobieski commented on July 28, 2024

This issue pertains to Issue 1 in the Web Speech API specification.

Issue 1: The group has discussed whether WebRTC might be used to specify selection of audio sources and remote recognizers. See Interacting with WebRTC, the Web Audio API and other external sources thread on [email protected].

from speech-api.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.