Introduction
We can envision and consider client-side, server-side and third-party speech recognition, synthesis and translation scenarios for a next version of the Web Speech API.
Advancing the State of the Art
Speech Recognition
Beyond speech-to-text, speech recognition includes speech-to-SSML and speech-to-hypertext. With speech-to-SSML and speech-to-hypertext, there can be a higher degree of fidelity possible for round-tripping speech audio through speech recognition and synthesis components or services.
Speech Synthesis
Beyond text-to-speech, speech synthesis includes SSML-to-speech and hypertext-to-speech.
Translation
Translation scenarios include processing text, SSML, hypertext or audio in a source language into text, SSML, hypertext or audio in a target language.
Desirable features include interoperability between client-side, server-side and third-party translation and WebRTC with translations available as subtitles or audio tracks.
Multimodal Dialogue Systems
Interesting scenarios include Web-based multimodal dialogue systems which efficiently utilize client-side, server-side and third-party speech recognition, synthesis and translation.
Client-side Scenarios
Client-side Speech Recognition
These scenarios are considered in the current version of the Web Speech API.
Client-side Speech Synthesis
These scenarios are considered in the current version of the Web Speech API.
Client-side Translation
These scenarios are new to the Web Speech API and involve the client-side translation of text, SSML, hypertext or audio into text, SSML, hypertext or audio.
Server-side Scenarios
Server-side Speech Recognition
These scenarios are new to the Web Speech API and involve one or more audio streams from a client being streamed to a server which performs speech recognition, optionally providing speech recognition results to the client.
Server-side Speech Synthesis
These scenarios are new to the Web Speech API and involve a client sending text, SSML or hypertext to a server which performs speech synthesis and streams audio to the client.
Server-side Translation
These scenarios are new to the Web Speech API and involve a client sending text, SSML, hypertext or audio to a server for translation into text, SSML, hypertext or audio.
Third-party Scenarios
Third-party Speech Recognition
These scenarios are new to the Web Speech API and involve one or more audio streams from a client or server being streamed to a third-party service which performs speech recognition providing speech recognition results to the client or server.
Third-party Speech Synthesis
These scenarios are new to the Web Speech API and involve a client or server sending text, SSML or hypertext to a third-party service which performs speech synthesis and streams audio to the client or server.
Third-party Translation
These scenarios are new to the Web Speech API and involve a client sending text, SSML, hypertext or audio to a third-party translation service for translation into text, SSML, hypertext or audio.
Hyperlinks
Amazon Web Services
Google Cloud AI
IBM Watson Products and Services
Microsoft Cognitive Services
Real Time Translation in WebRTC