rhasspy / larynx Goto Github PK

End to end text to speech system using gruut and onnx

License: MIT License

Makefile 0.18% Shell 6.66% Python 62.95% HTML 29.34% Dockerfile 0.87%

larynx's Issues

Sound was lost in french word rez-de-chaussée

Try to get audio for french word: rez-de-chaussée
Here's command line:
cat << EOF |
fr|rez-de-chaussée.
EOF
/usr/local/bin/larynx
--debug
--csv
--glow-tts /path/fr-fr/siwis-glow_tts
--hifi-gan /path/hifi_gan/universal_large
--output-dir /mnt/d/99/voices/
--language fr-fr
--denoiser-strength 0.001

Debug data:
DEBUG:larynx:Words for 'rez-de-chaussée': ['rez-de-chaussée']
DEBUG:larynx:Phonemes for 'rez-de-chaussée': ['#', 'ʁ', 'e', 'd', 'ʃ', 'o', 's', 'e', '#', '‖', '‖']
Phonemes is OK for this word but there is not sound 'd' in an output audio.

What password su for docker image

Liason in French

In French sometimes two words sound like one
DEBUG:larynx:Words for 'oui, c'est un': ['oui', ',', "c'est", 'un']
DEBUG:larynx:Phonemes for 'c'est un': ['#', 's', 'e', 't', '#', 'œ̃', '#', '‖', '‖']
't' was lost in the output wav and phonemes should be something like this
DEBUG:larynx:Phonemes for 'c'est un': ['#', 's', 'e', 't', 'œ̃', '#', '‖', '‖']

DEBUG:larynx:Words for 'ce n'est pas un': ['ce', "n'est", 'pas', 'un']
DEBUG:larynx:Phonemes for 'ce n'est pas un': ['#', 's', 'e', 'ə', '#', 'n', 'ɛ', '#', 'p', 'a', '#', 'œ̃', '#', '‖', '‖']
the output wav was OK ('z' was added) but I think phonemes should be something like this
DEBUG:larynx:Phonemes for 'ce n'est pas un': ['#', 's', 'e', 'ə', '#', 'n', 'ɛ', '#', 'p', 'a', 'z', 'œ̃', '#', '‖', '‖']

Problems pronouncing times and dates

It looks like the English and German voices fail to pronounce dates and times like this (only ones I've tested):

English: 4/23/2021, 5:02:54 PM
German: 23.4.2021, 17:02:51

I know this is a widely discussed problem in the TTS field and not so easy to solve, but maybe there is some smart python library that does the work ;-). A small script using regular expressions could be a start, but to make this work for every language there has to be some ML based procedure I guess.

Maybe you are already working on something? ^^

'denoiser_strength' referenced before assignment [waveglow]

cat << EOF |
leçon|leçon
garçon|garçon
EOF
/usr/local/bin/larynx --csv --glow-tts /mnt/d/99/voices/fr-fr/siwis-glow_tts --waveglow /mnt/d/99/voices/waveglow/wn_256 --output-dir /mnt/d/99/fr_sw/ --language fr-fr --denoiser-strength 0.001
Traceback (most recent call last):
File "/usr/local/bin/larynx", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/larynx/main.py", line 185, in main
for text_idx, (text, audio) in enumerate(text_and_audios):
File "/usr/local/lib/python3.7/dist-packages/larynx/init.py", line 146, in text_to_speech
audio = future.result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/dist-packages/larynx/init.py", line 185, in _sentence_task
audio = vocoder_model.mels_to_audio(mels, settings=vocoder_settings)
File "/usr/local/lib/python3.7/dist-packages/larynx/waveglow.py", line 59, in mels_to_audio
if denoiser_strength > 0:
UnboundLocalError: local variable 'denoiser_strength' referenced before assignment

Dot (.) stops synthesis

I am new to Larynx, so maybe my question can be answered easily and quickly, but I couldn't find anything to fix it.

Whenever a dot character is encountered, synthesis ends. I don't even need multiple sentences, but if it encounters something like X (feat. Y) it just says X feat. I am using Larynx over opentts in Home Assistant, but this can easily replicated in the GUI as well. So how exactly can I fix this? And maybe for later, how exactly can I synthesize multiple sentences? Thank you very much in advance, the voices are superb!

Available benchmarks?

Not an issue

I am looking at using Larynx in my rhasspy implementation and was wondering about Benchmarks before I go ahead and run some tests myself. I am interested in using a select one or two voices at medium quality, and wanted to pick the one with the quickest synthesizing. Just by randomly testing a couple of voices, I see noticeable differences between the voice for the same options and piece of text, so there are differences. But, has anyone put together some benchmarks to compare the voices?

Also, on a related note, are there any benchmarks in installation methods? My current method of installation is the Docker container then calling a GET request, converting the binary response to a .wav file, and playing the wave file (all in Python 3 on a raspberry pi 4, 64 bit). But, has anyone noticed differences in speed between the Docker vs. Debian vs. Python 3 installation?

Any other language besides ljspeech en doesn't work

Hi everyone, awesome job with your TTS module!
I have a few problems getting it to work with foreign languages.
I tried with siwis and the 'it' voice, no error comes out, it just doesn't play anything, meanwhile ljspeech works correctly.

Tried with the latest larynx version on Ubuntu 18.04 (amd64), I'm always using it on CLI.

"Python installation" method fails on musl-based Linux

The method "Python Installation" installs the current and all old versions of larynx (7: 1.0.3 down to 0.3.0) and gruut in this step:

pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -f 'https://download.pytorch.org/whl/cpu/torch_stable.html' larynx

Same for the simpler command:

pip3 install larynx

Of course, this ends in massive version conflicts. The problem first occurred after version 1.0.0.
My python3 is version 3.9.7.

Ideas for lipsync and visemes?

First, love the project !

I have a robotic and virtual agent project that I'm trying to get as close to real-time response as possible.
I use the following to generate speech:
python3 fastVoice.py | larynx -v ek --interactive --ssml --raw-stream --cuda --half --max-thread-workers 8 --stdin-format lines --process-on-blank-line| aplay -r 22050 -c 1 -f S16_LE
Where fastVoice.py just dumps the SSML from a socket onto stdin (remember to flush properly ...)
fastVoice.txt

All works very well. Audio generally starts <1s from receiving the message. The question is how to get a phoneme-viseme sequence synced with the audio output.
I can manage to get level 0-ish lipsync by looking at the amplitude of the audio output, but that gives enough info for just the jaw, not the viseme's of the lips.

Do you have any ideas/pointers on how to maintain the responsiveness of "--raw-stream" while getting real-time matching info to generate the matching visemes?

OpenAPI page broken

I am getting HTTP 500 returned when I go to http://localhost:5002/openapi/ - The browser page says "Fetch error undefined /openapi/swagger.json"

On the command line, I tried find /usr/local/python3/ -name '*swagger*' and only got results for the swagger_ui package in site-packages.

sample-rate is working incorrectly

Parameter --sample-rate is working incorrectly. If I set it as --sample-rate 44100 then voice speed up in 2 times.

Cannot redirect audio output to file with --raw-stream

When I try to redirect larynx output to a .wav file from the shell, the file produced is corrupted, when I try to play the output with the same command adding | aplay syntax it plays flawlessly.
larynx -v cmu_jmk -q high --raw-stream < /mnt/hgfs/HostSharedFolder/text/text.txt > test.wav
Am I missing something?
Following the informations given in the wiki the command larynx -v cmu_jmk -q high "Test text." > test.wav works as expected, so it seems there's an issue with the --raw-stream specifier and output redirection, could you please help?

Longer pause

I'd like to be able to manually add a 2-second pause between different paragraphs of text. Is there a way to do this?

Package `larynx-tts_0.5.0_amd64.deb` installs but fails to run older systems

Problem

The package larynx-tts_0.5.0_amd64.deb installs on Elementary OS 5.1 (which is based on Ubuntu 18.04 LTS which is based on Debian ~buster/sid*) but the supplied python3 binary/larynx script fails to run due to an issue related to libc versioning.

$ larynx --help
python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by python3)

Workaround

I'd recently encountered this issue with another project so was able to work around the issue in the interim by extracting a package with a later version of libc and helping things find what they were looking for. waves hands here

Cause

Anyway, as far as I'm aware, this issue occurs because the Larynx package is built on a machine with a more recent libc version than the one installed locally.

Which I think is confirmed by this line in the docker config:

larynx/Dockerfile.debian

Line 5 in 91ea42c

FROM debian:buster-slim as python37

Options for resolving issue

In terms of "resolving" the issue:

Ideally the package could be built on an older base system docker image so older machines could still run it successfully. (As I understand it, I think the only libc version changes are related to some optimisations but I don't know if they impact Larynx's performance.)
Alternatively the package could be configured with version information that would prevent installation on older, incompatible systems, unless manually overridden.

I'll admit I didn't really expect the Larynx package to ship its own Python binary instead of depending on system packages but I assume that's to ensure compatibility with compiled extensions?

Appreciation

Despite this issue I was able to get up and running with Larynx after applying the workaround and overall am very happy with the initial resulting output.

Thanks for all the work you've put into the project, I'm really excited about the potential that high quality, free & open source offline text to speech technology brings with it!

Thanks!

Сделайте нормальную инструкцию по запуску

Сделайте нормальную инструкцию по запуску. Нифига не понятно, даже когда ты не новичок.

Siwis avec + sa wrong phonemes

example
DEBUG:gruut.phonemize:Loading lexicon from /usr/lib/larynx-tts/gruut/fr-fr/lexicon.db
DEBUG:larynx:Words for 'avec sa mauvaise vue': ['avec', 'sa', 'mauvaise', 'vue']
DEBUG:larynx:Phonemes for 'avec sa mauvaise vue': ['#', 'a', 'v', 'ɛ', 'k', '#', 'ɛ', 's', 'a', '#', 'm', 'ɔ', 'v', 'ɛ', 'z', '#', 'v', 'y', '#', '‖', '‖']

there should not be a 'ɛ', before 's', 'a'

New languages need a link

Thanks @synesthesiam for this excellent tool.

I followed the 'Python installation' method (on Ubuntu 20.10) and added the language de-de via python3 -m gruut de-de download . Before I could use the new language, I had to add a link in ~/.local/lib/python3.8/site-packages/gruut/data/ to ~/.config/gruut/de-de; otherwise the new language was not found.

Release v0.4.0 contains only a single German larynx-tts-voice (Thorsten)

All others seem to miss the the onnx model:

larynx-tts-voice-de-de-eva-k-glow-tts_0.4.0_all.deb
383 KB
larynx-tts-voice-de-de-karlsson-glow-tts_0.4.0_all.deb
387 KB
larynx-tts-voice-de-de-pavoque-glow-tts_0.4.0_all.deb
393 KB
larynx-tts-voice-de-de-rebecca-braunert-plunkett-glow-tts_0.4.0_all.deb
367 KB
larynx-tts-voice-de-de-thorsten-glow-tts_0.4.0_all.deb
102 MB

Real-time factor: calculation

I know the metric real time factor (RTF) from STT (or ASR) systems. A RTF of 0.5 would mean than 1 sec is recognized in 0.5 sec.

I would expect a similar logic for TTS systems. But the numbers reported in larynx' debug output as Real-time factor seem to be 1/RTF. This is confusing, isn't it?

Integration for accessibility on linux

      On linux blind and visualy impared people use there computer via the screen-reader orca. It reads the contents on the screen out loud ot to the customer. It works with speech-dispatcher. Speech-dispatcher has generic module-files where we can add the integration of larynx relatively easily. To be able to use this natural sounding voices with orca we need to write such a module file. We also need to achief a verry small delay from sending the text to the engine and play the wav, because otherwise the system will not be fluent. But if we achief this we will bring accessibility on linux to the next level.

1 Dutch extra "t" sounds

ik ga naar huis is pronounced as: ik ga naar huist
ik ga naar de bakker is pronounced as: ik ga naar de bakkert
jij moet opstaan is pronounced as: jij moet opstaant

SSML file not processing under --ssml flag

Testing both Larynx and Larynx.server install via pip3 in a venv. All dependencies are satisfied. Fedora 34 all up to date.

Using the example SSML in a file TTS-SSML_test.txt:
larynx.server --> input contents of file into input box and run. SSML checkbox unchecked or checked = voice recognizing ssml cmds and not reading them

Using larynx from cmd line:
$ python3 -m larynx -v southern_english_female-glow_tts < TTS-SSML_test.txt
reads whole file including all the SSML statements

$ python3 -m larynx --ssml -v southern_english_female-glow_tts < TTS-SSML_test.txt
errors:
Traceback (most recent call last):
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/text_processor.py", line 479, in process
root_element = etree.fromstring(text)
File "/usr/lib64/python3.9/xml/etree/ElementTree.py", line 1348, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 7

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/TextToSpeech/venv/lib64/python3.9/site-packages/larynx/main.py", line 720, in
main()
File "/TextToSpeech/venv/lib64/python3.9/site-packages/larynx/main.py", line 294, in main
for result_idx, result in enumerate(tts_results):
File "/TextToSpeech/venv/lib64/python3.9/site-packages/larynx/init.py", line 71, in text_to_speech
for sentence in gruut.sentences(
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/init.py", line 79, in sentences
graph, root = text_processor(text, lang=lang, ssml=ssml, **process_args)
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/text_processor.py", line 432, in call
return self.process(*args, **kwargs)
File "/TextToSpeech/venv/lib64/python3.9/site-packages/gruut/text_processor.py", line 483, in process
root_element = etree.fromstring(f"{text}")
File "/usr/lib64/python3.9/xml/etree/ElementTree.py", line 1348, in XML
return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 22

Also tried piping the file in via cat:
cat TTS-SSML_test.txt | python3 -m larynx --ssml -v southern_english_female-glow_tts
Same error
Produces audio file without the --ssml flag, but as above includes all the SSML statements

Been through the documentation page and tried the examples to narrow this down. There is nothing specific to using a SSML specific file to produce the audio. Non-SSML examples all work on my workstation

Would like to get this working for a small project that produces training audio files of Shorin-Ryu Karate Yakusokus for my Black belt test practice

Thanks,

Letter t and p

In Russian words "Установите" and "зарядку", letter "т" and "р" pronunciation not good.

missing liaison in phonemes but wrong one heard

DEBUG:larynx:Words for 'avec ton amour': ['avec', 'ton', 'amour']
DEBUG:larynx:Phonemes for 'avec ton amour': ['#', 'a', 'v', 'ɛ', 'k', '#', 't', 'ɔ̃', '#', 'a', 'm', 'u', 'ʁ', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)

I can hear "ton zamour" :)

German voice

Did you think about using pavoque-data for German voice?
https://github.com/marytts/pavoque-data/releases/tag/v0.2
It has better quality in comparing with thorsten audio dataset.
It has restriction for commercial using but as I see you already have a voice with that restriction.

Version/tag mismatch when downloading voices for 1.0.0 release

Looks like the Github version tag is 1.0 but the code is looking for 1.0.0. The assets exist on Github with 1.0 in the path, but I'm getting this error when trying to download voices from the web interface:

larynx.utils.VoiceDownloadError: Failed to download voice en-us_kathleen-glow_tts from http://github.com/rhasspy/larynx/releases/download/v1.0.0/en-us_kathleen-glow_tts.tar.gz: HTTP Error 404: Not Found

hifi_gan-vctk_small vs hifi_gan-vctk_medium (release 2021-03-28)

The naming confuses me a little bit. hifi_gan-vctk_small is larger (and slower) than hifi_gan-vctk_medium.

Adding support for Windows Sapi5 implimentation

Hey there developers! I found this repo by exploring, and I'd like to make some requests.
Firstly: Releasing a windows sapi5 version of the tts engine, compatible with all the voices that are available, with integrated necessary encoders which ensure a fast and responsive synthesis: Details below.
I am a blind person who uses a screen reader to use the computer. Blind people like me require a responsive speech synthesizer so they can recieve the requested information without any unnecessary delays, and a quite poppular part of them require very fast speech output without resulting in weird voice artifacts such as those produced by natural sounding tts voices. If I were stupid and ignorant to the point where I don't realize the hard work for it, I would ask you to make an Nvda addon containing the synthesizer along with a possibility to download the voices, but a more mainstream windows integrated option like sapi5 would maybe a little easier perhaps?
Anyway, I know that this project is for rasberry py/commandline usage, but the currently available voices attracted someone like me who uses a more beneficial option for say, dayly usage or something. I look forward to your responce, This is just a request from me, if it can't be done it can't be done. So thanks, and have a good time

Exclude tests in setuptools.find_packages

I'm currently packaging larynx + deps for Arch Linux and I encountered an issue with larynx: setuptools includes the tests in the package, which does not play well with Arch Linux packaging - see rhasspy/phonemes2ids#1 for details.

So I propose to also exclude the tests for larynx.

siwis : sound not so accurate

DEBUG:larynx:Words for 'ce fait est avéré': ['ce', 'fait', 'est', 'avéré']
DEBUG:larynx:Phonemes for 'ce fait est avéré': ['#', 's', 'e', 'ə', '#', 'f', 'ɛ', '#', 'ɛ', '#', 'a', 'v', 'e', 'ʁ', 'e', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)

I can hear a sound 'd" after 'f', 'ɛ'.
I suppose that's because in such a context, we can have a liaison "t" or not.

Server with HTTPS ?

Hi,

This is a very useful project !
Very easy to install with the provided .deb file

Could you please explain in the Readme how to run the server with HTTPS in debian ?

For example with a certificate generated with Letsencrypt.

Thanks

Mac onnxruntime cannot import name 'get_all_providers

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/onnxruntime/capi/_pybind_state.py:14: UserWarning: Cannot load onnxruntime.capi. Error: 'dlopen(/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so, 2): Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
Reason: image not found'.
warnings.warn("Cannot load onnxruntime.capi. Error: '{0}'.".format(str(e)))
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 109, in _get_module_details
import(pkg_name)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/larynx-0.3.0-py3.7.egg/larynx/init.py", line 9, in
import onnxruntime
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/onnxruntime/init.py", line 13, in
from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed,
ImportError: cannot import name 'get_all_providers' from 'onnxruntime.capi._pybind_state' (/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/onnxruntime/capi/_pybind_state.py)

With onnx == 1.7.0 its okay.

About the use of this software

@synesthesiam

Hi,I have contacted this software for the first time. I have cloned it locally now. How do I use it?
I didn't understand the README file, it only wrote the usage method, but didn't write very detailed information on how to deploy and use it.
I want to use it in Python, how do I deploy it? What are the specific steps?

CUDA Does not appear to be working in docker container

Running the latest docker container with the nvidia container runtime nvidia-smi returns and shows the graphics card as available and ready.

You can run larynx from the command line inside of the container without error.

But as soon as you pass the cuda flag in

^C(.venv) root@larynx-dd4858485-t9dj2:/home/larynx/app/larynx# python -m larynx --cuda
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/larynx/app/larynx/__main__.py", line 750, in <module>
    main()
  File "/home/larynx/app/larynx/__main__.py", line 66, in main
    import torch
ModuleNotFoundError: No module named 'torch'

Similar errors occur if you attempt to start the container with the cuda flag as an additional argument.

By executing into the container and using the venv that exists I was able to install torch and then run the command.

I believe the build container has an issue here https://github.com/rhasspy/larynx/blob/master/Dockerfile#L42 as my knowledge of python is limited it appears that the intent is to use a precompiled version of torch that you are providing, but it does not appear to actually be making it into the container.

Using larynx as a module in python code

I find this project cool and useful, but I have a question in mind. Is it possible to use it like pyttsx3 as a TTS engine in code?
If yes, how is it possible.

MaryTTS API interface is not 100% compatible

Hi Michael,

congratulations for your Larynx v1.0 release 🥳 . Great work, as usual 🙂.

I've been trying to use Larynx with the new SEPIA v0.24.0 client since it has an option now to use MaryTTS compatible TTS systems directly, but encountered some issues:

The /voices endpoint is not delivering information in the same format. The MaryTTS API response is: [voice] [language] [gender] [tech=hmm] but Larynx is giving [laguage]/[voice]. Since I'm automatically parsing the string it currently fails to get the right language.
The /voices endpoint will show all voices including the ones that haven't been downloaded yet.
The Larynx quality parameter is not accessible.

The last point is not really a MaryTTS compatibility issue, but it would be great to get each voice as 'low, medium, high' variation from the 'voices' endpoint, so the user could actually choose them from the list.

I believe the Larynx MaryTTS endpoints are mostly for Home-Assistant support and I'm not sure how HA is parsing the voices list (maybe it doesn't parse it at all or just uses the whole string), but it would be great to get the original format from the /voices endpoint. Would you be willing to make these changes? 😇 😁

MaryTTS emulation and Home Assistant

I'm having trouble setting up the MaryTTS component in Home Assistant to work with Larynx. In particular, there are several parameters that can be defined in yaml. The docs give this example:

tts:
  - platform: marytts
    host: "localhost"
    port: 59125
    codec: "WAVE_FILE"
    voice: "cmu-slt-hsmm"
    language: "en_US"
    effect:
      Volume: "amount:2.0;"

Larynx is up and running and I can generate speech via localhost:59125. I'd like to use a specific voice and quality setting with Home Assistant's TTS. I tried setting the following:

...
    voice: "harvard-glow_tts"
    language: "en_us"
...

But Home Assistant's log shows an error saying that "en_us" is not a valid language ("en_US" is, though).

What are the correct parameters necessary to use a specific voice? And would it be possible to use an effect key to set the voice quality (high, medium, low)?

Required versions for python and pip

I have a working setup on a recent linux box (with python 3.8). But now I have to use an older computer (python 3.5, pip 8.1.1) and I run into trouble:

 Using cached https://files.pythonhosted.org/packages/f8/4d/a2.../larynx-0.3.1.tar.gz
 Complete output from command python setup.py egg_info:
 Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-build-sbulg1mn/larynx/setup.py", line 13
    long_description: str = ""
                    ^
 SyntaxError: invalid syntax

What are the minimum versions required by larynx at the moment?

Would that be possible to support ROCm as an alternative to CUDA?

Now that there is official support for ROCm in PyTorch, would it be possible to use that instead of CUDA?

Soften the stop of voice at start of break?

How would I "soften" the end of the sentence at break? Jarring abrupt stop to voice at each break start.

I was thinking of moving to a Japanese voice for the Japanese words, then back to English for the movement directions. Abrupt change ups could make it painful to listen to.

Maybe when you add more SSML set there would be enhanced control to tackle this

`DEBUG:larynx:Words for 'il tirerait le premier.': ['il', 'tirerait', 'le', 'premier', '.']
DEBUG:larynx:Phonemes for 'il tirerait le premier.': ['#', 'i', 'l', '#', 't', 'i', 'ʁ', 'ə', 'ʁ', 'ɛ', '#', 'l', 'ə', '#', 'p', 'ʁ', 'ə', 'm', 'j`

I can hear "il tirait le premier".

word by word timestamp or "boundary" event

Hi!

It would be great to have the ability to do something like print a word as it's being spoken, either with a word-by-word timestamp feature of an "onboundary" type of event like in the Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/onboundary

Thanks!

Siwis : wrong phonemes for "de"

DEBUG:larynx:Words for 'de fait': ['de', 'fait']
DEBUG:larynx:Phonemes for 'de fait': ['#', 'd', 'a', 'm', '#', 'f', 'ɛ', '#', '‖', '‖']

SSL error when downloading new tts

Steps to reproduce:

Run larynx-server on NixOS with Docker
Attempt to download a tts

Full error output:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.7/site-packages/quart/app.py", line 1827, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/app/.venv/lib/python3.7/site-packages/quart/app.py", line 1875, in dispatch_request
    return await handler(**request_.view_args)
  File "/app/larynx/server.py", line 667, in api_download
    tts_model_dir = download_voice(voice_name, voices_dirs[0], url)
  File "/app/larynx/utils.py", line 78, in download_voice
    response = urllib.request.urlopen(link)
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 1367, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.7/urllib/request.py", line 1326, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

Keyboard Shortcut

Hey! Just wondering if it is possible to implement a keyboard shortcut functionality?