Giter Site home page Giter Site logo

Faster audio output/processing about aspeak HOT 9 CLOSED

kxxt avatar kxxt commented on May 27, 2024
Faster audio output/processing

from aspeak.

Comments (9)

kxxt avatar kxxt commented on May 27, 2024

Outputting to default speaker should be as fast as the demo on the trial page. If you are outputting to an audio file, it's slow.

from aspeak.

Funktionar avatar Funktionar commented on May 27, 2024

I'm outputing to speakers and it's slower

from aspeak.

Funktionar avatar Funktionar commented on May 27, 2024

should I switch to stream mode?

from aspeak.

kxxt avatar kxxt commented on May 27, 2024

How slow is it? I didn't experience significantly large delays compared with the demo.

from aspeak.

Funktionar avatar Funktionar commented on May 27, 2024

third as slow

from aspeak.

kxxt avatar kxxt commented on May 27, 2024

I just did a profile:

python -m cProfile -m aspeak -t
         2860752 function calls (2854737 primitive calls) in 34.520 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   34.021   34.021 __main__.py:1(<module>)
        2    0.000    0.000    0.399    0.199 auth.py:1(<module>)
        1    0.000    0.000    0.879    0.879 auth.py:10(_get_auth_token)
        1    0.000    0.000   32.147   32.147 functional.py:11(pure_text_to_speech)
        1    0.000    0.000   33.983   33.983 main.py:122(main)
        1    0.000    0.000    0.948    0.948 main.py:18(read_file)
        1    0.000    0.000    0.000    0.000 main.py:25(preprocess_text)
        1    0.000    0.000   32.147   32.147 main.py:46(speech_function_selector)
        1    0.000    0.000   33.095   33.095 main.py:69(main_text)
        1    0.290    0.290   32.146   32.146 provider.py:36(text_to_speech)
      2/1    0.000    0.000    0.498    0.498 runpy.py:103(_get_module_details)
        1    0.000    0.000   34.520   34.520 runpy.py:199(run_module)
        1    0.000    0.000   34.022   34.022 runpy.py:63(_run_code)
        1    0.000    0.000   31.820   31.820 speech.py:1565(speak_text)
        1    0.000    0.000   29.846   29.846 speech_py_impl.py:6148(speak_text)
  • 34.520s is the total time.
  • 34.021s is the time spent in main.
  • 33.893s spent on the main.py
  • 32.146s spent on text_to_speech function in provider.py
  • 0.948s spent on reading from stdin
  • 0.879s spent on getting the auth token.
  • 31.820s spent by azure's speech package to do the actual speech synthesis work which is out of my control.

So the space for optimization is 33.893 - 31.820 - 0.948 - 0.879 = 0.24600000000000044
Actually there is almost nothing to optimize, except:

return speechsdk.SpeechSynthesizer(speech_config=cfg, audio_config=output).speak_text(text)

We could cache the synthesizer here if you are always using the same parameters for text_to_speech.

from aspeak.

kxxt avatar kxxt commented on May 27, 2024

I can provide an API with cached SpeechSynthesizer in the next version but I'm very busy recently so don't expect that to arrive very soon.

You could do it yourself by building your own version of SpeechServiceProvider if you are always calling text_to_speech/pure_text_to_speech with the same set of parameters

  • Create and store your SpeechConfig and AudioOutputConfig in it.
  • Just cache the SpeechSynthesizer and recreate it using the same config in case of token expiration.
  • Modify text_to_speech and ssml_to_speech method on your SpeechServiceProvider to utilize the cached SpeechSynthesizer and remove the config parmeters from the methods.
  • Call SpeechServiceProvider.text_to_speech(text) or SpeechServiceProvider.ssml_to_speech(ssml) to do speech synthesis (You can create ssml using the create_ssml function in aspeak.ssml)

However, frankly speaking, I don't know by how mush will the performance improve.

from aspeak.

kxxt avatar kxxt commented on May 27, 2024

Actually I don't think the 200ms delay is realistic.

I opened https://eastus.tts.speech.microsoft.com in a browser and I got 268ms delay
image

from aspeak.

Funktionar avatar Funktionar commented on May 27, 2024

Thanks

from aspeak.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.