Comments (13)
@guest271314 so this wasn't implemented yet? I'm trying to set an attribute on utterance that tells it that the input is ssml: https://stackoverflow.com/questions/49724626/google-speech-api-or-web-speech-api-support-for-ssml
This looks to me like a chicken-and-egg issue.
If nothing in the spec says whether .text is ssml or not, I don't see how implementations would dare telling the synth that it's ssml when it's not even sure whether it is or not.
The backends are ready: espeak-ng supports it, speech-dispatcher supports it, it's a one-line change in speech-dispatcher-based browser backends to support it. But if the API does not tell when it should be enabled, we can not dare enabling it.
from speech-api.
If nobody implements the SSML bits, maybe it should just be removed from the spec, rather than trying to clarify this? @andrenatal @gshires, WDYT?
from speech-api.
@foolip Not sure what you mean by
nobody implements the SSML bits
?
SSML parsing is definitely utilized by *mazon *lexa and *olly, *BM *luemix, *oogle *ctions as a web service (for a fee or with an EUL agreement).
We should be able to implement the specification without using an external web service or licensing agreement.
from speech-api.
@foolip There is an available patch to implement SSML parsing at Chromium by way of speech-dispatcher
, see https://bugs.chromium.org/p/chromium/issues/detail?id=88072. Unfortunately have not yet been able to access a 64-bit device.
from speech-api.
@foolip The portion of the Web Speech API specification that is difficult to navigate is trying to determine if SSML parsing is supported at the specific platform, see https://bugs.chromium.org/p/chromium/issues/detail?id=88072#c48. We know that neither Chromium nor Firefox has actually set the SSML parsing flag to "on" when initializing SSIP communication with speech-dispatcher
. Cannot state the reason therefor at Chromium other than this comment
Setting SSML to true when passing speech to the Linux speech-dispatcher would help, but we couldn't land that by itself - we'd want to at a minimum try to support that on other platforms, or at least parse SSML and strip out the tags, converting them to plaintext, on platforms without SSML support.
though the capability is available to do so.
This addresses whether the string or document
is SSML in the first instance https://github.com/guest271314/SpeechSynthesisSSMLParser/blob/master/SpeechSynthesisSSMLParser.js#L89.
from speech-api.
@foolip Further, we could
-
Negate the use of
speech-dispatcher
altogether and shipespeak-ng
with browsers with the appropriate option set for SSML parsing by default. -
Ideally, build the speech synthesizer from scratch using only Web Audio API.
from speech-api.
So, does any browser engine (Chrome, EdgeHTML, Gecko or WebKit) try to parse the text
property as SSML? That's what I mean by being implemented.
from speech-api.
@foolip Have not tried Edge or Webkit, which both utilize different approaches than Chromium and Firefox. Edge has SAPI, MacOS does not parse SSML at all, though has their own form of markup.
For Chromium and Firefox the bridge is speech-dispatcher
, which calls a speech synthesis module to parse the text or SSML.
The issue is that neither Chromium nor Firefox implementations actually pass the appropriate flags to the SSIP socket to turn on SSML parsing for the speech synthesizer module.
from speech-api.
See brailcom/speechd#1 (comment)
from speech-api.
@foolip The below code should meet the requirement of the specification using JavaScript
<!DOCTYPE html>
<html>
<head>
<title>Parse Text or SSML for 5.2.3 SpeechSynthesisUtterance Attributes text attribute test </title>
<script>
// https://w3c.github.io/speech-api/speechapi.html#utterance-attributes
// "5.2.3 SpeechSynthesisUtterance Attributes text attribute This attribute specifies the text to be synthesized and spoken for this utterance. This may be either plain text or a complete, well-formed SSML document."
const text_or_ssml = [
"hello universe"
, `<?xml version="1.0"?><!DOCTYPE speak PUBLIC "-//W3C//DTD SYNTHESIS 1.0//EN" "http://www.w3.org/TR/speech-synthesis/synthesis.dtd"><speak version="1.1"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">hello universe</speak>`
, (new DOMParser()).parseFromString(`<?xml version="1.0"?><speak version="1.1"
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">hello universe</speak>`, "application/xml")
, (() => {
const doc_type = document.implementation.createDocumentType("speak", "PUBLIC", `"-//W3C//DTD SYNTHESIS 1.0//EN"
"http://www.w3.org/TR/speech-synthesis/synthesis.dtd"`);
const ssml = document.implementation.createDocument ("http://www.w3.org/2001/10/synthesis", "speak", doc_type);
ssml.documentElement.textContent = "hello universe";
return ssml;
})()
];
window.speechSynthesis.cancel();
window.speechSynthesis.onvoiceschanged = async () => {
for (let text_ssml of text_or_ssml) {
await new Promise(resolve => {
const utterance = new SpeechSynthesisUtterance();
const parser = new DOMParser();
let parsed_text_ssml;
if (text_ssml && typeof text_ssml === "string") {
parsed_text_ssml = parser.parseFromString(text_ssml, "application/xml");
if (parsed_text_ssml.querySelector("parsererror") && parsed_text_ssml.documentElement.nodeName !== "speak") {
console.warn("not a complete, well-formed SSML document.", parsed_text_ssml.querySelector("parsererror").textContent);
}
else {
text_ssml = parsed_text_ssml;
}
}
if (text_ssml instanceof XMLDocument && text_ssml.documentElement.nodeName === "speak") {
console.log("complete, well-formed SSML document.", text_ssml.documentElement);
utterance.text = text_ssml.documentElement.textContent;
}
else {
console.log("plain text", text_ssml);
utterance.text = text_ssml;
}
utterance.onend = resolve;
window.speechSynthesis.speak(utterance);
})
}
};
</script>
</head>
<body>
</body>
</html>
from speech-api.
@guest271314 so this wasn't implemented yet? I'm trying to set an attribute on utterance that tells it that the input is ssml: https://stackoverflow.com/questions/49724626/google-speech-api-or-web-speech-api-support-for-ssml
from speech-api.
@dtturcotte SSML parsing is not implemented at Chromium by default, see https://bugs.chromium.org/p/chromium/issues/detail?id=88072, https://bugs.chromium.org/p/chromium/issues/detail?id=806592.
This is a beginning of an implementation client side using JavaScript https://github.com/guest271314/SpeechSynthesisSSMLParser. It is also possible to use Native Messaging https://src.chromium.org/viewvc/chrome/trunk/src/chrome/common/extensions/docs/examples/api/nativeMessaging/host/ at Chromium/Chrome to communicate with espeak
or espeak-ng
at host and get the result back as a data URL
at the app https://github.com/jdiamond/chrome-native-messaging, or use a local server to get the stdout from the command to JavaScript https://stackoverflow.com/questions/48219981/how-to-programmatically-send-a-unix-socket-command-to-a-system-server-autospawne.
from speech-api.
I added a test for SSML in web-platform-tests/wpt#12568 to see if it's supported anywhere.
from speech-api.
Related Issues (20)
- Define how to load custom voices HOT 3
- Can I capture UserMedia stream and do processing on that along with using SpeechRecognition HOT 1
- SpeechRecognition ends after unspecified time
- Bubbling model for SpeechSynthesisUtterance HOT 1
- need to improve speech recognition in conversation between multiple speakers
- Speach Destination HOT 1
- Requirements for SpeechSynthesisErrorEvent HOT 1
- Offline/on-device speech recognition HOT 2
- Could we have web speech API support IPA for speech synthesis voice language HOT 2
- Arabic TTS Web Speech is Missing some letters
- Interaction with screen readers and other assistive technology? HOT 1
- getVoices() is supposed to be user agent dependent, but appears not to be. HOT 1
- Android issue HOT 1
- Could not get isFinal == true HOT 9
- Feature request: SpeechRecognition pause/resume HOT 2
- Multiple Issues HOT 1
- speechSynthesis: utterance with lang but not voice
- Why is SpeechRecognition not working correctly in Safari?
- Clear Privacy Contracts HOT 1
- Continuously listening
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speech-api.