Giter Site home page Giter Site logo

vosk-tts's Introduction

Vosk TTS

Simple TTS based on VITS with some old ideas

Usage

Command line

pip3 install vosk-tts

vosk-tts -n vosk-model-tts-ru-0.6-multi -s 2 --input "Привет мир!" --output out.wav

API

from vosk_tts import Model, Synth
model = Model(model_name="vosk-model-tts-ru-0.6-multi")
synth = Synth(model)

synth.synth("Привет мир!", "out.wav", speaker_id=2)

Voices

For now we support several Russian voices 3 females and 2 males. Get the model here:

vosk-model-tts-ru-0.6-multi

You can use speaker IDs from 0 to 4 included.

We plan to add more voices and languages in the future.

vosk-tts's People

Contributors

nshmyrev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vosk-tts's Issues

Не сбрасывается предыдущая генерация

Периодически, не могу понять с какой частотой и логикой, происходит зависание синтезатора.
Иногда помогает ребут, иногда нет.

Наташа вызывается так:

[hi]
exten = s,1,Answer()
same = n,Wait(1)
same = n,Set(RHV_FILE=${CALLERID(num)}_${EPOCH})
same = n,Set(RHV_TEXT="Компания привествует Вас, ${CALLERID(Name)}")
same = n,System(/var/lib/asterisk/agi-bin/Robovoice.py ${RHV_FILE} ${RHV_TEXT})
same = n,Playback(/home/asterisk/sounds/${RHV_FILE}&silence/1)
same = n,System(rm -f /home/asterisk/sounds/${RHV_FILE}.wav)
same = n,System(rm -f /home/asterisk/sounds/${RHV_FILE}_t.wav)

Сейчас попробовал запустить ее из командной строки тупо скопировав пример отсюда.
В выводе получил

[root@freepbx asterisk]# ^C
[root@freepbx asterisk]# vosk-tts -n vosk-model-tts-ru-0.1-natasha --input "Привет мир!" --output ~/out.wav

vosk-tts -n vosk-model-tts-ru-0.1-natasha --input "Привет мир"/home/asterisk/speech/Robovoice.py 9999_1691413691 'Компания приветствует Вас, Иван Иванов'" --output ~/out.wav
> ^C

и на этом зависание и выход по ctrl+c

Если я правильно понимаю, предыдущая генерация из диалплана как-то зависла и никуда не девается.
Как перед генерацией очистить предыдущий запрос принудительно?

ERROR

File "/.../training/data_utils.py", line 220, in _filter
for audiopath, sid, text, cleaned in self.audiopaths_sid_text:
ValueError: not enough values to unpack (expected 4, got 3)

TypeError: generator_loss() missing 1 required positional argument: 'disc_generated_outputs'

Hi there. Thank you very much for this TTS.
I will talk about training own data.

At first I noticed mistake in step 0

Clone this repository and build monotonic align

git clone https://github.com/alphacep/vosk-tts
cd vosk-tts/training
cd monotonic_align
python setup.py build_ext --inplace
cd ..
you missed
mkdir monotonic_align
and then you can run
python setup.py build_ext --inplace

at second when I try to run python3 train_finetune.py
It gives me some errors:

File "train_finetune.py", line 497, in
main()
File "train_finetune.py", line 58, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))

torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')

torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):

torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
vosk-tts/training/train_finetune.py", line 241, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc],
vosk-tts/training/train_finetune.py", line 358, in train_and_evaluate
loss_gen, losses_gen = generator_loss(y_d_hat_g)
TypeError: generator_loss() missing 1 required positional argument: 'disc_generated_outputs'

What I did wrong?

My System:
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

Env:
Conda

Python:
3.8.18
Pytorch
version 1.13.1 (+cu117)

I get the requirements.txt from here https://github.com/FENRlR/MB-iSTFT-VITS2/blob/main/requirements.txt

Voice data was recorded with same text of db-finetune folder. Wavs also 22050Hz mono.

Will be glad to hear from you comments.

English support?

Do we have English support? Something like:

model = Model(model_name="vosk-model-tts-en-0.6-multi")

TIA for any help!

Ударения

Прекрасная модель, все замечательно работает.
Как прямо указать ударение в некоторых словах? Пробовал делать заглавную букву или ставить плюс перед гласной - результата нет.
Спасибо

Got an error during the opening of model directory

Hello!

Have got an error

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 517: character maps to undefined

trying to create a Model object. Could fix it by adding encoding type to the line #46 in model.py

for line in open(model_path / "dictionary", encoding='utf8'):

Возможность убрать внутренние print'ы

Николай, здравствуйте.
В коде присутствуют принты, которые могут мешать смотреть логи сервиса, в котором находится Ваша модель синтеза. Нет ли возможности их убрать, либо сделать опционально?

How to add new words to dictionary?

Hello dear Nikolay.
I have a new dataset with new text (sentences) and wav audio. As right is if the words of new content not in the dictionary it has an error.
Is any script to convert new words to format of your dictionary? Or any script which can add new words with transcription to exiting dictionary?
Thank you.

p.s. Hopefully will try your gpt-sovits in near time.

How to load pretrained model?

Hi! Thanks for great models.
I'd like to try to finetune with additional voice, but I can't load pretrained model:
torch.load returns pickle error.

What has torch package been used for saving? (there is no requirements file in training catalog)

Turkish Language Support

Thank you for creating VOSK TTS models. Vosk has multilanguage support for STT inference. I tried Turkish STT model. It is very successful.
I wish you add a Turkish TTS model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.