I needed this to write a test for <a href="https://bugs.chromium.org/p/chromium/issues

Timing of SpeechSynthesis state changes not defined about speech-api HOT 6 OPEN

wicg commented on July 28, 2024

Timing of SpeechSynthesis state changes not defined

from speech-api.

Comments (6)

foolip commented on July 28, 2024

I needed this to write a test for https://bugs.chromium.org/p/chromium/issues/detail?id=679043.

from speech-api.

GLRoylance commented on July 28, 2024

What issues do you have? A transitory pending? Whether speaking goes false at the end of each utterance and goes true at each new utterance? Whether the engine can remove a second utterance from the queue and thus start "speaking" it before it has finished speaking the first?

The spec has the engine initialized to pending=false, speaking=false, and paused=false. Those attributes do not seem to have events directly associated with them. The important events happen to utterances. Testing pending, speaking, and paused during an utterance event is begging for a race error.

When the user does .speak(utterance), logically that puts the utterance in the queue, so pending=true.

The engine removes an utterance from the queue. If the engine removes the last utterance from the queue, then pending=false.

The engine starts processing the utterance and building audio to play. When it has a buffer, it starts sending audio to the speakers.

There's a window from when an utterance is pulled off the queue to when the audio starts coming out of the speakers. Supposedly, anytime during during that window the engine can issue the start utterance event and make speak=true. Ideally, speaking=noise coming out of the speakers, but I do not see that as a strict requirement of the spec. The spec isn't clear about the order the start utterance event and the speaking attribute, but I'm not sure it needs to be.

At some point, the engine finishes processing the utterance and posts the last audio block, but the engine cannot return the utterance just yet. The engine must wait for the utterance's last audio block to finish playing. Then the engine can issue the utterance end event and release the utterance.

A reasonable engine will pull the next utterance off the queue before the audio from the previous utterance has finished playing. Pending may go false even though the first utterance has not issued an end event. The current spec implies that processing may start on the next utterance, but the start utterance event will not happen until after the previous utterance has issued its end event. (The current description does not allow overlapped utterances / box model, so sequential, ordered, events are implied.)

If the user commands pause, then the engine should set pause=true should pause the audio system. It must then figure out which sample the audio system paused on so it can determine the current utterance. It may have to issue an end event for the previous utterance, a start event for the current utterance, and a pause event for the current utterance. There may also be mark and boundary events that need to be issued in their proper order.

If the .pause() hits after the only utterance has finished speaking, then there is no utterance for a pause event, so no pause event is issued.

I don't think the spec covered this (the pause event is "Fired when and if this utterance is paused mid-utterance."), but imagine the speech system has been (1) paused when it just finished utterance 1 but before it has pulled the next utterance off the queue or (2) paused with no utterances in the queue or speaking, and then an utterance is added with .speak(). That means there's no utterance pause event. The engine should pull the next utterance off the queue (when and if it arrives), issue an utterance start event, and immediately issue an utterance pause event. ("Mid-utterance" should include at the start of the utterance (sample 0).)

If the user commands resume, then pause=false and the engine resumes the audio and issues the utterance resume event.

There's a subtle question about ordering the transitions of the pending, speaking, and paused attributes with respect to the utterance events, but I don't think a program should ever depend on those timings because they can change asynchronously. The program might be processing an utterance resume event when a subsequent pause has been executed; the utterance processing must proceed no matter the current state of the speech engine.

from speech-api.

foolip commented on July 28, 2024

It's just that the spec just doesn't say exactly when state is manipulated and events are fired. Compare to https://html.spec.whatwg.org/multipage/media.html#dom-media-pause which has an algorithm that synchronously set the paused attribute and says, effective "queue a task for fire a simple event named paused".

Web Speech might say:

Return and run the following steps in parallel:
1. Wait until [some condition is true]
2. Queue a task to run the following steps
  1. Set [some state correspdoning which is what the paused attribute uses]
  2. Fire a simple event named "pause" at [some target]

from speech-api.

foolip commented on July 28, 2024

In other words, unlike media elements, it looks like Web Speech changes the script-readable state right before events are fired. This is actually better I think. Nonetheless, the spec doesn't say in enough detail to write tests asserting as much.

from speech-api.

GLRoylance commented on July 28, 2024

I'm still having trouble with your desires. The state transitions of the speech engine (pending, speaking, paused) do not have to be ordered with respect to the state of the utterances (start, marks, paused, resumed, end). Furthermore, the code handling an utterance event should not be looking at the speech engine state.

The Web Speech spec is not firing events at the speech engine (except for onvoiceschanged which is async to everything else). The events are fired at utterances.

You can .speak(uttLasting5seconds), field an onstart event for that utterance, wait 1 second, and command .pause() (from outside the event handler). You don't know what the state of the speech engine is after the call, but you should see an onpause event for the utterance. You can then issue a .resume() and expect to see an onresume event followed by an onend event.

You cannot rely on this behavior:

    utter.onpause = t.step_func(() => {
        utter.onpause = null;
        assert_true(speechSynthesis.paused, 'paused state at pause event');
         speechSynthesis.resume();
         // paused state changes async, right before the resume event
        assert_true(speechSynthesis.paused, 'paused state after resume()');
         utter.onresume = t.step_func_done(() => {
          assert_false(speechSynthesis.paused, 'paused state at resume event');
        });
      });

It confuses many issues. Why can't .resume() be instantaneous?

from speech-api.

foolip commented on July 28, 2024

I don't have a strong opinion about what the best behavior is, I'm just pointing out that the spec in fact doesn't say what the behavior should be. "paused state changes async, right before the resume event" was just matching what I observed browsers to do.

from speech-api.

Timing of SpeechSynthesis state changes not defined about speech-api HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent