Comments (7)
Hi!
I tried several voices in the GUI, they all did it. Didn't try all of them, but I can certainly give it a shot. I am using harvard right now.
Edit: ALL of the voices I could try in the interface exhibit this problem. "Hello. This is two sentences." simply yields "Hello."
This is Home Assistant OS running on a Raspberry PI 4B with 4GB of RAM. I don't know how to start the script on the CLI since it is using Docker containers. If there is a way, I can gladly try.
Edit: Tried using rhasspy. This does work like a treat. So it almost looks line an issue with OpenTTS?
I encountered it with song titles as well as a simple "Testing. Testing. Testing." and it stops at the first sentence. I also tried pasting multiple sentences from anywhere and it stopped at the first dot.
from larynx.
Hi, to make this issue easier to debug it might be helpful to supply some additional information:
- Does this happen with all voices/vocoders? If not, which ones, specifically?
- Does it happen when using the
larynx
script from the command line? e.g.larynx -v en "Hello. This is two sentences."
Certainly in my experience Larynx has synthesized multiple sentences without special handling, so there might be something about the setup that's not working properly.
What operating system/version is this ocurring on?
(Also, are these song titles? Have you tested with typing sample sentences directly in case there's an issue with possible hidden/special characters in the title?)
from larynx.
Thanks for trying those other approaches & reporting back.
Based on your descriptions it does seem likely to be an issue around the OpenTTS integration.
I don't have any experience with that aspect of this project so can't give you any specific help for that, sorry.
In terms of debugging approach I'd look at how the text string gets passed through the different parts of the system to see if part of it is getting dropped along the way--maybe see if Home Assistant/OpenTTS logs the input/output text data during processing to see where/if it changes?
from larynx.
Thanks! I already assumed that I'll need to report this in OpenTTS itself, just thought I had to start somewhere. And since it doesn't seem to be larynx itself, I'll try that. Also I found how to enable debug and it seems it synthesizes the text in three completely different runs.
--debug --larynx-quality high --larynx-noise-scale 0.333 --larynx-length-scale 1.0
DEBUG:opentts:Namespace(cache=None, debug=True, flite_voices_dir=None, host='0.0.0.0', larynx_denoiser_strength=0.001, larynx_length_scale=1.0, larynx_noise_scale=0.333, larynx_quality='high', marytts_like=None, marytts_url=None, mozillatts_url=None, no_espeak=False, no_festival=False, no_flite=False, no_larynx=False, no_nanotts=False, port=5500)
DEBUG:opentts:Loaded TTS systems: espeak, flite, festival, nanotts, marytts, larynx
Running on 0.0.0.0:5500 over http (CTRL + C to quit)
DEBUG:opentts:['espeak-ng', '--voices']
DEBUG:opentts:Festival voices: {'kal_diphone'}
DEBUG:opentts:Loading voices from voices/marytts
DEBUG:opentts:Voice(id='bits1-hsmm', name='bits1-hsmm', gender='female', language='de', locale='de', tag=None)
DEBUG:opentts:Voice(id='dfki-pavoque-neutral-hsmm', name='dfki-pavoque-neutral-hsmm', gender='male', language='de', locale='de', tag=None)
DEBUG:opentts:Voice(id='bits3-hsmm', name='bits3-hsmm', gender='male', language='de', locale='de', tag=None)
DEBUG:opentts:['espeak-ng', '--voices']
DEBUG:opentts:Festival voices: {'kal_diphone'}
INFO:opentts:Synthesizing with larynx:eva_k-glow_tts (23 char(s))...
DEBUG:opentts:Synthesizing line 1 (23 char(s))
DEBUG:gruut.toksen:Number converter regex: ^-?\d+([,.]\d+)*\w+$
DEBUG:gruut.phonemize:Loading lexicon from voices/larynx/gruut/de-de/lexicon.db
DEBUG:glow_tts:Loading model from voices/larynx/de-de/eva_k-glow_tts/generator.onnx
DEBUG:hifi_gan:Loading HiFi-GAN model from voices/larynx/hifi_gan/vctk_small/generator.onnx
DEBUG:opentts:TTS settings: {'noise_scale': 0.333, 'length_scale': 1.0}
DEBUG:opentts:Vocoder settings: {'denoiser_strength': 0.001}
DEBUG:larynx:{'_': 0, '|': 1, '‖': 2, '#': 3, 'a': 4, 'aɪ̯': 5, 'aʊ̯': 6, 'aː': 7, 'b': 8, 'd': 9, 'd͡ʒ': 10, 'eː': 11, 'f': 12, 'g': 13, 'h': 14, 'iː': 15, 'j': 16, 'k': 17, 'l': 18, 'm': 19, 'n': 20, 'oː': 21, 'p': 22, 'p͡f': 23, 's': 24, 't': 25, 't͡s': 26, 't͡ʃ': 27, 'uː': 28, 'v': 29, 'x': 30, 'yː': 31, 'z': 32, 'ãː': 33, 'ç': 34, 'õː': 35, 'øː': 36, 'ŋ': 37, 'œ': 38, 'ɐ': 39, 'ɔ': 40, 'ɔʏ̯': 41, 'ə': 42, 'ɛ': 43, 'ɛː': 44, 'ɛ̃ː': 45, 'ɪ': 46, 'ʁ': 47, 'ʃ': 48, 'ʊ': 49, 'ʏ': 50, 'ʒ': 51, 'ʔ': 52, 'χ': 53}
DEBUG:larynx:Words for 'Test.': ['test', '.']
DEBUG:larynx:Phonemes for 'Test.': ['#', 't', 'ɛ', 's', 't', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Eins.': ['eins', '.']
DEBUG:larynx:Phonemes for 'Eins.': ['#', 'a', 'eː', 'n', 's', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Zwei.': ['zwei', '.']
DEBUG:larynx:Got mels in 0.19924291200004518 second(s) (shape=(1, 80, 48))
DEBUG:larynx:Phonemes for 'Zwei.': ['#', 't͡s', 'v', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Words for 'Drei.': ['drei', '.']
DEBUG:larynx:Phonemes for 'Drei.': ['#', 'd', 'ʁ', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Got mels in 0.29696504096500576 second(s) (shape=(1, 80, 62))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Initializing denoiser
DEBUG:hifi_gan:Initializing denoiser
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 1.1020990899996832 second(s) (shape=(12288,))
DEBUG:larynx:Real-time factor: 0.42 (audio=0.56 sec, infer=1.31 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:opentts:Got 24620 WAV byte(s) for line 1
DEBUG:opentts:Synthesized 24620 byte(s) in 9.16156530380249 second(s)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 1.1691214450402185 second(s) (shape=(15872,))
DEBUG:larynx:Real-time factor: 0.49 (audio=0.72 sec, infer=1.47 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Got mels in 0.27170646691229194 second(s) (shape=(1, 80, 46))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Got mels in 0.27694296499248594 second(s) (shape=(1, 80, 48))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.377510052989237 second(s) (shape=(11776,))
DEBUG:larynx:Real-time factor: 0.82 (audio=0.53 sec, infer=0.65 sec)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.28667956008575857 second(s) (shape=(12288,))
DEBUG:larynx:Real-time factor: 0.99 (audio=0.56 sec, infer=0.57 sec)
INFO:opentts:Synthesizing with larynx:rebecca_braunert_plunkett-glow_tts (23 char(s))...
DEBUG:opentts:Synthesizing line 1 (23 char(s))
DEBUG:glow_tts:Loading model from voices/larynx/de-de/rebecca_braunert_plunkett-glow_tts/generator.onnx
DEBUG:opentts:TTS settings: {'noise_scale': 0.333, 'length_scale': 1.0}
DEBUG:opentts:Vocoder settings: {'denoiser_strength': 0.001}
DEBUG:larynx:Words for 'Test.': ['test', '.']
DEBUG:larynx:Phonemes for 'Test.': ['#', 't', 'ɛ', 's', 't', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Eins.': ['eins', '.']
DEBUG:larynx:Phonemes for 'Eins.': ['#', 'a', 'eː', 'n', 's', '#', '‖', '‖']
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Words for 'Zwei.': ['zwei', '.']
DEBUG:larynx:Phonemes for 'Zwei.': ['#', 't͡s', 'v', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Words for 'Drei.': ['drei', '.']
DEBUG:larynx:Phonemes for 'Drei.': ['#', 'd', 'ʁ', 'aɪ̯', '#', '‖', '‖']
DEBUG:larynx:Got mels in 0.1456335949478671 second(s) (shape=(1, 80, 28))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Got mels in 0.17054839001502842 second(s) (shape=(1, 80, 30))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.20584573596715927 second(s) (shape=(7168,))
DEBUG:larynx:Real-time factor: 0.92 (audio=0.33 sec, infer=0.35 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:opentts:Got 14380 WAV byte(s) for line 1
DEBUG:opentts:Synthesized 14380 byte(s) in 5.937345743179321 second(s)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.2447036859812215 second(s) (shape=(7680,))
DEBUG:larynx:Real-time factor: 0.83 (audio=0.35 sec, infer=0.42 sec)
DEBUG:larynx:Running text to speech model (GlowTextToSpeech)
DEBUG:larynx:Got mels in 0.15959876799024642 second(s) (shape=(1, 80, 28))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:larynx:Got mels in 0.16664397495333105 second(s) (shape=(1, 80, 26))
DEBUG:larynx:Running vocoder model (HiFiGanVocoder)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.22502166801132262 second(s) (shape=(7168,))
DEBUG:larynx:Real-time factor: 0.84 (audio=0.33 sec, infer=0.39 sec)
DEBUG:hifi_gan:Running denoiser (strength=0.001)
DEBUG:larynx:Got audio in 0.1921218209899962 second(s) (shape=(6656,))
DEBUG:larynx:Real-time factor: 0.84 (audio=0.30 sec, infer=0.36 sec)
from larynx.
Yep, this appears to be a bug in the OpenTTS integration. I messed up and assumed that sentences were split in a different place. I'll get this cleaned up and release a new version.
from larynx.
Thank you very much! I am looking forward to it :)
from larynx.
Should be fixed now in OpenTTS 2.1
from larynx.
Related Issues (20)
- Keyboard Shortcut HOT 2
- make a Speak.py file
- Colab example showing how to train/finetune
- How to send text to larynx SERVER using BASH script? HOT 2
- Python install fail HOT 2
- Voice suggestion: GLaDOS from portal HOT 3
- How to change port number when running from docker HOT 4
- Suppress warnings
- Reads nice as niece
- Make web demo optional
- how to init a docker image which contains specified voice
- How to train a voice model? HOT 1
- Browser request for favicon.ico returns HTTP 500 error and error on console
- ImportError: cannot import name 'escape' from 'jinja2' HOT 1
- voices-dir option of larynx.server doesn't work
- Dates like "1700s" and "1980s" are replaced with the current date
- Question about quality of voice HOT 1
- Improve performance with caching HOT 1
- Bash MacOS Install won't run due to CERTIFICATE_VERIFY_FAILED
- Persian support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from larynx.