I am new to Larynx, so maybe my question can be answered easily and quickly, but I cou

Should be fixed now in OpenTTS 2.1 <g-emoji class="g-emoji" alias="+1" fallback-src="h

Dot (.) stops synthesis about larynx HOT 7 CLOSED

rhasspy commented on July 17, 2024

Dot (.) stops synthesis

from larynx.

Comments (7)

chainria commented on July 17, 2024 1

Hi!

I tried several voices in the GUI, they all did it. Didn't try all of them, but I can certainly give it a shot. I am using harvard right now.
Edit: ALL of the voices I could try in the interface exhibit this problem. "Hello. This is two sentences." simply yields "Hello."

This is Home Assistant OS running on a Raspberry PI 4B with 4GB of RAM. I don't know how to start the script on the CLI since it is using Docker containers. If there is a way, I can gladly try.
Edit: Tried using rhasspy. This does work like a treat. So it almost looks line an issue with OpenTTS?

I encountered it with song titles as well as a simple "Testing. Testing. Testing." and it stops at the first sentence. I also tried pasting multiple sentences from anywhere and it stopped at the first dot.

from larynx.

follower commented on July 17, 2024

Hi, to make this issue easier to debug it might be helpful to supply some additional information:

Does this happen with all voices/vocoders? If not, which ones, specifically?
Does it happen when using the larynx script from the command line? e.g. larynx -v en "Hello. This is two sentences."

Certainly in my experience Larynx has synthesized multiple sentences without special handling, so there might be something about the setup that's not working properly.

What operating system/version is this ocurring on?

(Also, are these song titles? Have you tested with typing sample sentences directly in case there's an issue with possible hidden/special characters in the title?)

from larynx.

follower commented on July 17, 2024

Thanks for trying those other approaches & reporting back.

Based on your descriptions it does seem likely to be an issue around the OpenTTS integration.

I don't have any experience with that aspect of this project so can't give you any specific help for that, sorry.

In terms of debugging approach I'd look at how the text string gets passed through the different parts of the system to see if part of it is getting dropped along the way--maybe see if Home Assistant/OpenTTS logs the input/output text data during processing to see where/if it changes?

from larynx.

chainria commented on July 17, 2024

Thanks! I already assumed that I'll need to report this in OpenTTS itself, just thought I had to start somewhere. And since it doesn't seem to be larynx itself, I'll try that. Also I found how to enable debug and it seems it synthesizes the text in three completely different runs.

--debug --larynx-quality high --larynx-noise-scale 0.333 --larynx-length-scale 1.0
DEBUG:opentts:Namespace(cache=None, debug=True, flite_voices_dir=None, host='0.0.0.0', larynx_denoiser_strength=0.001, larynx_length_scale=1.0, larynx_noise_scale=0.333, larynx_quality='high', marytts_like=None, marytts_url=None, mozillatts_url=None, no_espeak=False, no_festival=False, no_flite=False, no_larynx=False, no_nanotts=False, port=5500)
DEBUG:opentts:Loaded TTS systems: espeak, flite, festival, nanotts, marytts, larynx
Running on 0.0.0.0:5500 over http (CTRL + C to quit)
DEBUG:opentts:['espeak-ng', '--voices']
DEBUG:opentts:Festival voices: {'kal_diphone'}
DEBUG:opentts:Loading voices from voices/marytts
DEBUG:opentts:Voice(id='bits1-hsmm', name='bits1-hsmm', gender='female', language='de', locale='de', tag=None)
DEBUG:opentts:Voice(id='dfki-pavoque-neutral-hsmm', name='dfki-pavoque-neutral-hsmm', gender='male', language='de', locale='de', tag=None)
DEBUG:opentts:Voice(id='bits3-hsmm', name='bits3-hsmm', gender='male', language='de', locale='de', tag=None)
DEBUG:opentts:['espeak-ng', '--voices']
DEBUG:opentts:Festival voices: {'kal_diphone'}
INFO:opentts:Synthesizing with larynx:eva_k-glow_tts (23 char(s))...
DEBUG:opentts:Synthesizing line 1 (23 char(s))
DEBUG:gruut.toksen:Number converter regex: ^-?\d+([,.]\d+)*\w+$
DEBUG:gruut.phonemize:Loading lexicon from voices/larynx/gruut/de-de/lexicon.db
DEBUG:glow_tts:Loading model from voices/larynx/de-de/eva_k-glow_tts/generator.onnx
DEBUG:hifi_gan:Loading HiFi-GAN model from voices/larynx/hifi_gan/vctk_small/generator.onnx
DEBUG:opentts:TTS settings: {'noise_scale': 0.333, 'length_scale': 1.0}
DEBUG:opentts:Vocoder settings: {'denoiser_strength': 0.001}
DEBUG:larynx:{'_': 0, '|': 1, '‖': 2, '#': 3, 'a': 4, 'aɪ̯': 5, 'aʊ̯': 6, 'aː': 7, 'b': 8, 'd': 9, 'd͡ʒ': 10, 'eː': 11, 'f': 12, 'g': 13, 'h': 14, 'iː': 15, 'j': 16, 'k': 17, 'l': 18, 'm': 19, 'n': 20, 'oː': 21, 'p': 22, 'p͡f': 23, 's': 24, 't': 25, 't͡s': 26, 't͡ʃ': 27, 'uː': 28, 'v': 29, 'x': 30, 'yː': 31, 'z': 32, 'ãː': 33, 'ç': 34, 'õː': 35, 'øː': 36, 'ŋ': 37, 'œ': 38, 'ɐ': 39, 'ɔ': 40, 'ɔʏ̯': 41, 'ə': 42, 'ɛ': 43, 'ɛː': 44, 'ɛ̃ː': 45, 'ɪ': 46, 'ʁ': 47, 'ʃ': 48, 'ʊ': 49, 'ʏ': 50, 'ʒ': 51, 'ʔ': 52, 'χ': 53}
DEBUG:larynx:Words for 'Test.': ['test', '.']
DEBUG:larynx:Phonemes for 'Test.': ['#', 't', 'ɛ', 's', 't', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Eins.': ['eins', '.']
DEBUG:larynx:Phonemes for 'Eins.': ['#', 'a', 'eː', 'n', 's', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Zwei.': ['zwei', '.']
DEBUG:larynx:Got mels in 0.19924291200004518 second(s) (shape=(1, 80, 48))
DEBUG:larynx:Phonemes for 'Zwei.': ['#', 't͡s', 'v', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Words for 'Drei.': ['drei', '.']
DEBUG:larynx:Phonemes for 'Drei.': ['#', 'd', 'ʁ', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Got mels in 0.29696504096500576 second(s) (shape=(1, 80, 62))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Initializing denoiser
DEBUG:hifi_gan:Initializing denoiser
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 1.1020990899996832 second(s) (shape=(12288,))
DEBUG:larynx:Real-time factor: 0.42 (audio=0.56 sec, infer=1.31 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:opentts:Got 24620 WAV byte(s) for line 1
DEBUG:opentts:Synthesized 24620 byte(s) in 9.16156530380249 second(s)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 1.1691214450402185 second(s) (shape=(15872,))
DEBUG:larynx:Real-time factor: 0.49 (audio=0.72 sec, infer=1.47 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Got mels in 0.27170646691229194 second(s) (shape=(1, 80, 46))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Got mels in 0.27694296499248594 second(s) (shape=(1, 80, 48))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.377510052989237 second(s) (shape=(11776,))
DEBUG:larynx:Real-time factor: 0.82 (audio=0.53 sec, infer=0.65 sec)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.28667956008575857 second(s) (shape=(12288,))
DEBUG:larynx:Real-time factor: 0.99 (audio=0.56 sec, infer=0.57 sec)
INFO:opentts:Synthesizing with larynx:rebecca_braunert_plunkett-glow_tts (23 char(s))...
DEBUG:opentts:Synthesizing line 1 (23 char(s))
DEBUG:glow_tts:Loading model from voices/larynx/de-de/rebecca_braunert_plunkett-glow_tts/generator.onnx
DEBUG:opentts:TTS settings: {'noise_scale': 0.333, 'length_scale': 1.0}
DEBUG:opentts:Vocoder settings: {'denoiser_strength': 0.001}
DEBUG:larynx:Words for 'Test.': ['test', '.']
DEBUG:larynx:Phonemes for 'Test.': ['#', 't', 'ɛ', 's', 't', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Eins.': ['eins', '.']
DEBUG:larynx:Phonemes for 'Eins.': ['#', 'a', 'eː', 'n', 's', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Zwei.': ['zwei', '.']
DEBUG:larynx:Phonemes for 'Zwei.': ['#', 't͡s', 'v', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Words for 'Drei.': ['drei', '.']
DEBUG:larynx:Phonemes for 'Drei.': ['#', 'd', 'ʁ', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Got mels in 0.1456335949478671 second(s) (shape=(1, 80, 28))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Got mels in 0.17054839001502842 second(s) (shape=(1, 80, 30))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.20584573596715927 second(s) (shape=(7168,))
DEBUG:larynx:Real-time factor: 0.92 (audio=0.33 sec, infer=0.35 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:opentts:Got 14380 WAV byte(s) for line 1
DEBUG:opentts:Synthesized 14380 byte(s) in 5.937345743179321 second(s)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.2447036859812215 second(s) (shape=(7680,))
DEBUG:larynx:Real-time factor: 0.83 (audio=0.35 sec, infer=0.42 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Got mels in 0.15959876799024642 second(s) (shape=(1, 80, 28))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Got mels in 0.16664397495333105 second(s) (shape=(1, 80, 26))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.22502166801132262 second(s) (shape=(7168,))
DEBUG:larynx:Real-time factor: 0.84 (audio=0.33 sec, infer=0.39 sec)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.1921218209899962 second(s) (shape=(6656,))
DEBUG:larynx:Real-time factor: 0.84 (audio=0.30 sec, infer=0.36 sec)

from larynx.

synesthesiam commented on July 17, 2024

Yep, this appears to be a bug in the OpenTTS integration. I messed up and assumed that sentences were split in a different place. I'll get this cleaned up and release a new version.

from larynx.

chainria commented on July 17, 2024

Thank you very much! I am looking forward to it :)

from larynx.

synesthesiam commented on July 17, 2024

Should be fixed now in OpenTTS 2.1 👍

from larynx.

Dot (.) stops synthesis about larynx HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent