Possible to use this in real-time communications? Compared with just azure it's slower

I just did a profile: <div class="highlight highlight-source-shell notranslate pos

Faster audio output/processing about aspeak HOT 9 CLOSED

kxxt commented on May 27, 2024

Faster audio output/processing

from aspeak.

Comments (9)

kxxt commented on May 27, 2024

Outputting to default speaker should be as fast as the demo on the trial page. If you are outputting to an audio file, it's slow.

from aspeak.

Funktionar commented on May 27, 2024

I'm outputing to speakers and it's slower

from aspeak.

Funktionar commented on May 27, 2024

should I switch to stream mode?

from aspeak.

kxxt commented on May 27, 2024

How slow is it? I didn't experience significantly large delays compared with the demo.

from aspeak.

Funktionar commented on May 27, 2024

third as slow

from aspeak.

kxxt commented on May 27, 2024

I just did a profile:

python -m cProfile -m aspeak -t

         2860752 function calls (2854737 primitive calls) in 34.520 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   34.021   34.021 __main__.py:1(<module>)
        2    0.000    0.000    0.399    0.199 auth.py:1(<module>)
        1    0.000    0.000    0.879    0.879 auth.py:10(_get_auth_token)
        1    0.000    0.000   32.147   32.147 functional.py:11(pure_text_to_speech)
        1    0.000    0.000   33.983   33.983 main.py:122(main)
        1    0.000    0.000    0.948    0.948 main.py:18(read_file)
        1    0.000    0.000    0.000    0.000 main.py:25(preprocess_text)
        1    0.000    0.000   32.147   32.147 main.py:46(speech_function_selector)
        1    0.000    0.000   33.095   33.095 main.py:69(main_text)
        1    0.290    0.290   32.146   32.146 provider.py:36(text_to_speech)
      2/1    0.000    0.000    0.498    0.498 runpy.py:103(_get_module_details)
        1    0.000    0.000   34.520   34.520 runpy.py:199(run_module)
        1    0.000    0.000   34.022   34.022 runpy.py:63(_run_code)
        1    0.000    0.000   31.820   31.820 speech.py:1565(speak_text)
        1    0.000    0.000   29.846   29.846 speech_py_impl.py:6148(speak_text)

34.520s is the total time.
34.021s is the time spent in main.
33.893s spent on the main.py
32.146s spent on text_to_speech function in provider.py
0.948s spent on reading from stdin
0.879s spent on getting the auth token.
31.820s spent by azure's speech package to do the actual speech synthesis work which is out of my control.

So the space for optimization is 33.893 - 31.820 - 0.948 - 0.879 = 0.24600000000000044
Actually there is almost nothing to optimize, except:

aspeak/src/aspeak/api/provider.py

Line 40 in e3b1b44

    
           return speechsdk.SpeechSynthesizer(speech_config=cfg, audio_config=output).speak_text(text)

We could cache the synthesizer here if you are always using the same parameters for text_to_speech.

from aspeak.

kxxt commented on May 27, 2024

I can provide an API with cached SpeechSynthesizer in the next version but I'm very busy recently so don't expect that to arrive very soon.

You could do it yourself by building your own version of SpeechServiceProvider if you are always calling text_to_speech/pure_text_to_speech with the same set of parameters

Create and store your SpeechConfig and AudioOutputConfig in it.
Just cache the SpeechSynthesizer and recreate it using the same config in case of token expiration.
Modify text_to_speech and ssml_to_speech method on your SpeechServiceProvider to utilize the cached SpeechSynthesizer and remove the config parmeters from the methods.
Call SpeechServiceProvider.text_to_speech(text) or SpeechServiceProvider.ssml_to_speech(ssml) to do speech synthesis (You can create ssml using the create_ssml function in aspeak.ssml)

However, frankly speaking, I don't know by how mush will the performance improve.

from aspeak.

kxxt commented on May 27, 2024

Actually I don't think the 200ms delay is realistic.

I opened https://eastus.tts.speech.microsoft.com in a browser and I got 268ms delay

from aspeak.

Funktionar commented on May 27, 2024

Thanks

from aspeak.

Faster audio output/processing about aspeak HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent