I'm not sure if this is possible, but I would be useful if I could get the duration of

Hello there :) Is there a way to compute a value for <code class="no

Support runtimeDuration attributes in SpeechSynthesisUtterance about speech-api HOT 5 OPEN

wicg commented on July 28, 2024

Support runtimeDuration attributes in SpeechSynthesisUtterance

from speech-api.

Comments (5)

kdavis-mozilla commented on July 28, 2024

What do you mean by "runtimeDuration".

Do you mean how long the TTS engine will run to generate the given phrase? Do you mean the maximum time for the TTS engine to run to create the given phrase? Do you mean the actual duration of the utterance after it was generated?

I would guess you mean "the actual duration of the utterance after it was generated". If so, you can just get this from the SpeechSynthesisUtterance events.

from speech-api.

serv commented on July 28, 2024

You are correct. I meant "the actual duration of the utterance after it was generated".

It seems like SpeechSynthesisUtterance events are fired only when the utterance is running "speak".

Would it be possible to get the actual duration of the utterance after it was generated the the time when an utterance is instantiated rather than during it is running "speak"?

The use case I am thinking about it similar to Youtube video showing its runtime duration. A user can see how long a video will last. Similarly, it would be useful to see how long an utterance will run for.

from speech-api.

kdavis-mozilla commented on July 28, 2024

I would guess that it is possible to get as estimate of the duration of an utterance before its audio is generated by, for example, introducing a duration prediction ML model. However, this would impose constraints on the implementation that would make this added feature onerous to implement.

For example, some browsers use the voices native to the OS and these voices, generally, do not have such an associated "duration prediction" ML model. So this would require that browser providers for each OS and for each voice create a "duration prediction" ML model and have these "duration prediction" ML models retrained for each new OS version and each new voice. While possible, I don't think the gained functionality justifies the effort.

I'd be interested in others views on this too.

from speech-api.

marcoscaceres commented on July 28, 2024

Agree that this sounds like a "nice to have", but not particularly critical thing to have in the spec.

from speech-api.

giuseppeg commented on July 28, 2024

Hello there :)

Is there a way to compute a value for rate so that an utterance (text) is read in n ms? I am hacking on a subtitles reader script and need to read sentences as fast as they are spoken in the video.

The spec is vague about rate:

1 is the default rate supported by the speech synthesis engine or specific voice (which should correspond to a normal speaking rate)

How can I quantify "normal speaking rate"?

from speech-api.

Support runtimeDuration attributes in SpeechSynthesisUtterance about speech-api HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent