The gpt_sovits_inference from x-d-lab

gpt_sovits_inference's Introduction

👏 项目描述

原始GPT_SoVITS的效果体验和推理服务较大依赖于基于Gradio的webui界面，为了更方便地推理体验GPT_SoVITS效果，本项目将其推理部分提取并暴露出来，支持一键式的推理部署。

🔥 模型列表

模型名称	模型下载	角色特点	语言
TTS-GPT_SoVITS-sunshine_girl	🤗 / 🤖	阳光少女	zh
TTS-GPT_SoVITS-heartful_sister	🤗 / 🤖	知性姐姐	zh

预训练模型

模型名称	模型下载
GPT-SoVITS	🤗 / 🤖

⚒️ 安装依赖

😇 如何使用

详细内容可以参见example.py

import os
import sys

project_root = os.path.abspath('.')
sys.path.append(project_root)


from get_tts_wav import GPT_SoVITS_TTS_inference

text = """我是MindChat漫谈心理大模型"""

inference = GPT_SoVITS_TTS_inference(prompt_language='zh', base_model_id='X-D-Lab/TTS-GPT_SoVITS-pretrained_models', audio_model_id='X-D-Lab/TTS-GPT_SoVITS-sunshine_girl')

inference.get_tts_wav(text=text, wav_save_path="./temp/output1.wav")

👏 Contributors

本项目仍然属于非常早期的阶段，欢迎各位开发者加入！

🙇‍ 致谢

本项目基于GPT-SoVITS进行，感谢他们的开源贡献。

gpt_sovits_inference's People

Contributors

Stargazers

gpt_sovits_inference's Issues

windows测试语音合成失败

python版本 Python 3.10.14
pip list

Package                Version
---------------------- -----------
addict                 2.4.0
aiohttp                3.9.5
aiosignal              1.3.1
aliyun-python-sdk-core 2.15.1
aliyun-python-sdk-kms  2.16.3
async-timeout          4.0.3
attrs                  23.2.0
audioread              3.0.1
Babel                  2.15.0
Brotli                 1.0.9
certifi                2024.2.2
cffi                   1.16.0
charset-normalizer     2.0.4
click                  8.1.7
cn2an                  0.5.22
colorama               0.4.6
contourpy              1.2.1
crcmod                 1.7
cryptography           42.0.7
cycler                 0.12.1
datasets               2.18.0
dateparser             1.1.8
decorator              5.1.1
dill                   0.3.8
Distance               0.1.3
docopt                 0.6.2
einops                 0.8.0
ffmpeg-python          0.2.0
filelock               3.13.1
fonttools              4.51.0
frozenlist             1.4.1
fsspec                 2024.2.0
future                 1.0.0
g2p-en                 2.1.0
gast                   0.5.4
gmpy2                  2.1.2
gruut                  2.3.4
gruut-ipa              0.13.0
gruut_lang_en          2.0.0
huggingface-hub        0.23.0
idna                   3.7
importlib_metadata     7.1.0
inflect                7.2.1
jieba_fast             0.53
Jinja2                 3.1.3
jmespath               0.10.0
joblib                 1.4.2
jsonlines              1.2.0
kiwisolver             1.4.5
LangSegment            0.3.3
librosa                0.9.2
lightning-utilities    0.11.2
llvmlite               0.42.0
lxml                   5.2.2
MarkupSafe             2.1.3
matplotlib             3.9.0
mkl-fft                1.3.8
mkl-random             1.2.4
mkl-service            2.4.0
modelscope             1.14.0
more-itertools         10.2.0
mpmath                 1.3.0
multidict              6.0.5
multiprocess           0.70.16
networkx               2.8.8
nltk                   3.8.1
num2words              0.5.13
numba                  0.59.1
numpy                  1.23.5
oss2                   2.18.5
packaging              24.0
pandas                 2.2.2
pillow                 10.3.0
pip                    24.0
platformdirs           4.2.2
pooch                  1.8.1
proces                 0.1.7
py3langid              0.2.2
pyarrow                16.1.0
pyarrow-hotfix         0.6
pycparser              2.22
pycryptodome           3.20.0
pyopenjtalk            0.3.3
pyparsing              3.1.2
pypinyin               0.51.0
PySocks                1.7.1
python-crfsuite        0.9.10
python-dateutil        2.9.0.post0
pytils                 0.4.1
pytorch-lightning      2.1.4
pytz                   2024.1
PyYAML                 6.0.1
regex                  2024.5.15
requests               2.31.0
resampy                0.4.3
safetensors            0.4.3
scikit-learn           1.4.2
scipy                  1.13.0
setuptools             69.5.1
simplejson             3.19.2
six                    1.16.0
sortedcontainers       2.4.0
soundfile              0.12.1
sympy                  1.12
threadpoolctl          3.5.0
tokenizers             0.19.1
tomli                  2.0.1
tools                  0.1.9
torch                  2.1.1+cu118
torchaudio             2.1.1
torchmetrics           1.3.0.post0
torchvision            0.16.1
tqdm                   4.66.4
transformers           4.41.0
typeguard              4.2.1
typing_extensions      4.11.0
tzdata                 2024.1
tzlocal                5.2
urllib3                2.2.1
wheel                  0.43.0
win-inet-pton          1.1.0
xxhash                 3.4.1
yapf                   0.40.2
yarl                   1.9.4
zipp                   3.18.2

测试代码

# 可以在任意地方跨目录调用get_tts_wav()
"""
# ===关于推理文本的语种 参考===
# 在config和调用get_tts_wav时，对于prompt_language和text_language参数
    "all_zh"    #全部按中文识别
    "en"        #全部按英文识别#######不变
    "all_ja"    #全部按日文识别
    "zh"        #按中英混合识别####不变
    "ja"        #按日英混合识别####不变
    "auto"      #多语种混合，启动切分识别语种
}
"""
"""
def get_tts_wav(
    text: str,      # 要转换为语音的文本。get_tts_wav()内部会对文本按标点自动切割。
    text_language: str = "zh", # 推理出的语音语言
    wav_savepath: str = "temp/output.wav" # 推理结果存放的路径与文件名称。会得到一个完整的wav
    ==其他次要参数==
    how_to_cut: str = "凑四句一切", # 切割推理文本的方法，一共有5种。
            # 推荐"凑四句一切"和"按标点符号切"。"按标点符号切"语速最慢,推理最准确
            # "凑四句一切","凑50字一切","按中文句号。切","按英文句号.切","按标点符号切"
    top_k: int = 20,
    top_p: float = 0.6,
    temperature: float = 0.6,
            # 关于上面三个参数 https://github.com/RVC-Boss/GPT-SoVITS/pull/457
    ref_free: bool = False  # 不输入参考音频内对应文本，进行推理。默认关闭
) -> None
"""
import os
import sys

project_root = os.path.abspath('.')
sys.path.append(project_root)

# from gpt_sovits_tts.get_tts_wav import GPT_SoVITS_TTS_inference
from get_tts_wav import GPT_SoVITS_TTS_inference

text = "How many roads must a man walk down before we call him a man"

"""
# 目前[20240227]modelscope上可用的语音模型audio_model_id
X-D-Lab/TTS-GPT_SoVITS-sunshine_girl
X-D-Lab/TTS-GPT_SoVITS-heartful_sister
"""

inference = GPT_SoVITS_TTS_inference(prompt_language='zh', base_model_id='X-D-Lab/TTS-GPT_SoVITS-pretrained_models',
                                     audio_model_id='X-D-Lab/TTS-GPT_SoVITS-openai_alloy')

inference.get_tts_wav(text=text, wav_save_path="./temp/output1.wav")

运行日志

D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\python.exe E:\code\python\project-litongjava\GPT_SoVITS_Inference\example.py 
2024-05-20 16:40:31,677 - modelscope - INFO - PyTorch version 2.1.1+cu118 Found.
2024-05-20 16:40:31,679 - modelscope - INFO - Loading ast index from C:\Users\Administrator\.cache\modelscope\ast_indexer
2024-05-20 16:40:31,916 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 18ce4fb4fa78515a5dce58072649d436 and a total number of 976 components indexed
D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Some weights of the model checkpoint at X-D-Lab/TTS-GPT_SoVITS-pretrained_models\chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at X-D-Lab/TTS-GPT_SoVITS-pretrained_models\chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\lib\site-packages\torch\functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:868.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
['Faild to synthesis please try again or contact to administrator。 ']
['en']
['How many roads must a man walk down before we call him a man。 ']
['en']
  6%|▋         | 96/1500 [00:02<00:32, 43.86it/s]
T2S Decoding EOS [115 -> 212]

Process finished with exit code 0

Recommend Projects