Giter Site home page Giter Site logo

gpt_sovits_inference's Introduction

👏 项目描述

原始GPT_SoVITS的效果体验和推理服务较大依赖于基于Gradio的webui界面,为了更方便地推理体验GPT_SoVITS效果,本项目将其推理部分提取并暴露出来,支持一键式的推理部署。

🔥 模型列表

模型名称 模型下载 角色特点 语言
TTS-GPT_SoVITS-sunshine_girl 🤗 / 🤖 阳光少女 zh
TTS-GPT_SoVITS-heartful_sister 🤗 / 🤖 知性姐姐 zh
  • 预训练模型
模型名称 模型下载
GPT-SoVITS 🤗 / 🤖

⚒️ 安装依赖

推荐 Python>=3.9,<=3.10

conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia

git clone https://github.com/X-D-Lab/GPT_SoVITS_Inference.git
cd GPT_SoVITS_Inference
pip install -r requirements.txt

如果您是windows使用者,请下载并将 ffmpeg.exeffprobe.exe 放置在本项目的根目录下。

😇 如何使用

详细内容可以参见example.py

import os
import sys

project_root = os.path.abspath('.')
sys.path.append(project_root)


from get_tts_wav import GPT_SoVITS_TTS_inference

text = """我是MindChat漫谈心理大模型"""

inference = GPT_SoVITS_TTS_inference(prompt_language='zh', base_model_id='X-D-Lab/TTS-GPT_SoVITS-pretrained_models', audio_model_id='X-D-Lab/TTS-GPT_SoVITS-sunshine_girl')

inference.get_tts_wav(text=text, wav_save_path="./temp/output1.wav")

👏 Contributors

本项目仍然属于非常早期的阶段,欢迎各位开发者加入!

🙇‍ 致谢

本项目基于GPT-SoVITS进行,感谢他们的开源贡献。

gpt_sovits_inference's People

Contributors

karry12138 avatar thomas-yanxin avatar

Stargazers

 avatar  avatar Tong Li avatar adamwang avatar  avatar  avatar ML avatar  avatar zhangjian avatar HeisenBerg? avatar

gpt_sovits_inference's Issues

windows测试语音合成失败

windows测试语音合成失败

python版本 Python 3.10.14
pip list

Package                Version
---------------------- -----------
addict                 2.4.0
aiohttp                3.9.5
aiosignal              1.3.1
aliyun-python-sdk-core 2.15.1
aliyun-python-sdk-kms  2.16.3
async-timeout          4.0.3
attrs                  23.2.0
audioread              3.0.1
Babel                  2.15.0
Brotli                 1.0.9
certifi                2024.2.2
cffi                   1.16.0
charset-normalizer     2.0.4
click                  8.1.7
cn2an                  0.5.22
colorama               0.4.6
contourpy              1.2.1
crcmod                 1.7
cryptography           42.0.7
cycler                 0.12.1
datasets               2.18.0
dateparser             1.1.8
decorator              5.1.1
dill                   0.3.8
Distance               0.1.3
docopt                 0.6.2
einops                 0.8.0
ffmpeg-python          0.2.0
filelock               3.13.1
fonttools              4.51.0
frozenlist             1.4.1
fsspec                 2024.2.0
future                 1.0.0
g2p-en                 2.1.0
gast                   0.5.4
gmpy2                  2.1.2
gruut                  2.3.4
gruut-ipa              0.13.0
gruut_lang_en          2.0.0
huggingface-hub        0.23.0
idna                   3.7
importlib_metadata     7.1.0
inflect                7.2.1
jieba_fast             0.53
Jinja2                 3.1.3
jmespath               0.10.0
joblib                 1.4.2
jsonlines              1.2.0
kiwisolver             1.4.5
LangSegment            0.3.3
librosa                0.9.2
lightning-utilities    0.11.2
llvmlite               0.42.0
lxml                   5.2.2
MarkupSafe             2.1.3
matplotlib             3.9.0
mkl-fft                1.3.8
mkl-random             1.2.4
mkl-service            2.4.0
modelscope             1.14.0
more-itertools         10.2.0
mpmath                 1.3.0
multidict              6.0.5
multiprocess           0.70.16
networkx               2.8.8
nltk                   3.8.1
num2words              0.5.13
numba                  0.59.1
numpy                  1.23.5
oss2                   2.18.5
packaging              24.0
pandas                 2.2.2
pillow                 10.3.0
pip                    24.0
platformdirs           4.2.2
pooch                  1.8.1
proces                 0.1.7
py3langid              0.2.2
pyarrow                16.1.0
pyarrow-hotfix         0.6
pycparser              2.22
pycryptodome           3.20.0
pyopenjtalk            0.3.3
pyparsing              3.1.2
pypinyin               0.51.0
PySocks                1.7.1
python-crfsuite        0.9.10
python-dateutil        2.9.0.post0
pytils                 0.4.1
pytorch-lightning      2.1.4
pytz                   2024.1
PyYAML                 6.0.1
regex                  2024.5.15
requests               2.31.0
resampy                0.4.3
safetensors            0.4.3
scikit-learn           1.4.2
scipy                  1.13.0
setuptools             69.5.1
simplejson             3.19.2
six                    1.16.0
sortedcontainers       2.4.0
soundfile              0.12.1
sympy                  1.12
threadpoolctl          3.5.0
tokenizers             0.19.1
tomli                  2.0.1
tools                  0.1.9
torch                  2.1.1+cu118
torchaudio             2.1.1
torchmetrics           1.3.0.post0
torchvision            0.16.1
tqdm                   4.66.4
transformers           4.41.0
typeguard              4.2.1
typing_extensions      4.11.0
tzdata                 2024.1
tzlocal                5.2
urllib3                2.2.1
wheel                  0.43.0
win-inet-pton          1.1.0
xxhash                 3.4.1
yapf                   0.40.2
yarl                   1.9.4
zipp                   3.18.2

测试代码

# 可以在任意地方跨目录调用get_tts_wav()
"""
# ===关于推理文本的语种 参考===
# 在config和调用get_tts_wav时,对于prompt_language和text_language参数
    "all_zh"    #全部按中文识别
    "en"        #全部按英文识别#######不变
    "all_ja"    #全部按日文识别
    "zh"        #按中英混合识别####不变
    "ja"        #按日英混合识别####不变
    "auto"      #多语种混合,启动切分识别语种
}
"""
"""
def get_tts_wav(
    text: str,      # 要转换为语音的文本。get_tts_wav()内部会对文本按标点自动切割。
    text_language: str = "zh", # 推理出的语音语言
    wav_savepath: str = "temp/output.wav" # 推理结果存放的路径与文件名称。会得到一个完整的wav
    ==其他次要参数==
    how_to_cut: str = "凑四句一切", # 切割推理文本的方法,一共有5种。
            # 推荐"凑四句一切"和"按标点符号切"。"按标点符号切"语速最慢,推理最准确
            # "凑四句一切","凑50字一切","按中文句号。切","按英文句号.切","按标点符号切"
    top_k: int = 20,
    top_p: float = 0.6,
    temperature: float = 0.6,
            # 关于上面三个参数 https://github.com/RVC-Boss/GPT-SoVITS/pull/457
    ref_free: bool = False  # 不输入参考音频内对应文本,进行推理。默认关闭
) -> None
"""
import os
import sys

project_root = os.path.abspath('.')
sys.path.append(project_root)

# from gpt_sovits_tts.get_tts_wav import GPT_SoVITS_TTS_inference
from get_tts_wav import GPT_SoVITS_TTS_inference

text = "How many roads must a man walk down before we call him a man"

"""
# 目前[20240227]modelscope上可用的语音模型audio_model_id
X-D-Lab/TTS-GPT_SoVITS-sunshine_girl
X-D-Lab/TTS-GPT_SoVITS-heartful_sister
"""

inference = GPT_SoVITS_TTS_inference(prompt_language='zh', base_model_id='X-D-Lab/TTS-GPT_SoVITS-pretrained_models',
                                     audio_model_id='X-D-Lab/TTS-GPT_SoVITS-openai_alloy')

inference.get_tts_wav(text=text, wav_save_path="./temp/output1.wav")

运行日志

D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\python.exe E:\code\python\project-litongjava\GPT_SoVITS_Inference\example.py 
2024-05-20 16:40:31,677 - modelscope - INFO - PyTorch version 2.1.1+cu118 Found.
2024-05-20 16:40:31,679 - modelscope - INFO - Loading ast index from C:\Users\Administrator\.cache\modelscope\ast_indexer
2024-05-20 16:40:31,916 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 18ce4fb4fa78515a5dce58072649d436 and a total number of 976 components indexed
D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Some weights of the model checkpoint at X-D-Lab/TTS-GPT_SoVITS-pretrained_models\chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at X-D-Lab/TTS-GPT_SoVITS-pretrained_models\chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
D:\ProgramData\Anaconda3\envs\xd_lab_GPT_SoVITS_Inference\lib\site-packages\torch\functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:868.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
['Faild to synthesis please try again or contact to administrator。 ']
['en']
['How many roads must a man walk down before we call him a man。 ']
['en']
  6%|▋         | 96/1500 [00:02<00:32, 43.86it/s]
T2S Decoding EOS [115 -> 212]

Process finished with exit code 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.