Giter Site home page Giter Site logo

qsingle / so-vits-svc-5.0 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from playvoice/whisper-vits-svc

0.0 0.0 0.0 35.15 MB

Core Engine of Singing Voice Conversion & Singing Voice Clone

Home Page: https://huggingface.co/spaces/maxmax20160403/sovits5.0

License: MIT License

Python 100.00%

so-vits-svc-5.0's Introduction

Singing Voice Conversion

Hugging Face Spaces Open in Colab GitHub Repo stars GitHub forks GitHub issues GitHub

【无需去伴奏】就能直接进行歌声转换的SVC库(轻度伴奏)

【使用Excel】进行原始的SVC调教

sonic visualiser

本项目更新中,代码还有性能缺陷(13K以上高频是模糊的),不推荐现在就用这套代码训练,测试模型测试用的~

Feature From Status Function
whisper OpenAI 强大的抗噪能力
bigvgan NVIDA 抗锯齿与蛇形激活
natural speech Microsoft 减少发音错误
neural source-filter NII 解决断音问题
speaker encoder Google 音色编码与聚类
GRL for speaker Skoltech 防止编码器泄露音色
one shot vits Samsung VITS 一句话克隆
band extention Adobe 16K升48K采样

模型简介

歌声音色转换模型,通过SoftVC内容编码器提取源音频语音特征,与F0同时输入VITS替换原本的文本输入达到歌声转换的效果。同时,更换声码器为 NSF HiFiGAN 解决断音问题

据不完全统计,多说话人似乎会导致音色泄漏加重,不建议训练超过10人的模型,目前的建议是如果想炼出来更像目标音色,尽可能炼单说话人的
针对sovits3.0 48khz模型推理显存占用大的问题,可以切换到32khz的分支 版本训练32khz的模型
目前发现一个较大问题,3.0推理时显存占用巨大,6G显存基本只能推理30s左右长度音频
断音问题已解决,音质提升了不少
2.0版本已经移至 sovits_2.0分支
3.0版本使用FreeVC的代码结构,与旧版本不通用
DiffSVC 相比,在训练数据质量非常高时diffsvc有着更好的表现,对于质量差一些的数据集,本仓库可能会有更好的表现,此外,本仓库推理速度上比diffsvc快很多

数据集准备

仅需要以以下文件结构将数据集放入dataset_raw目录即可

dataset_raw
├───speaker0
│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav
└───speaker1
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav

安装依赖

数据预处理

  • 1, 设置工作目录:heartpulse::heartpulse::heartpulse:不设置后面会报错

    export PYTHONPATH=$PWD

  • 2, 重采样

    将音频剪裁为小于30秒的音频段,whisper的要求

    生成采样率16000Hz音频, 存储路径为:./data_svc/waves-16k

    python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-16k -s 16000

    生成采样率48000Hz音频, 存储路径为:./data_svc/waves-48k

    python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-48k -s 48000

    可选的16000Hz提升到48000Hz,待完善~批处理

    python bandex/inference.py -w svc_out.wav

  • 3, 使用16K音频,提取音高

    python prepare/preprocess_f0.py -w data_svc/waves-16k/ -p data_svc/pitch

  • 4, 使用16k音频,提取内容编码

    python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper

  • 5, 使用16k音频,提取音色编码

    python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker

  • 6, 使用48k音频,提取线性谱

    python prepare/preprocess_spec.py -w data_svc/waves-48k/ -s data_svc/specs

  • 7, 使用48k音频,生成训练索引

    python prepare/preprocess_train.py

  • 8, 训练文件调试

    python prepare/preprocess_zzz.py

训练

  • 1, 设置工作目录:heartpulse::heartpulse::heartpulse:不设置后面会报错

    export PYTHONPATH=$PWD

  • 2, 启动训练,一阶段训练

    python svc_trainer.py -c configs/base.yaml -n sovits5.0

  • 3, 恢复训练

    python svc_trainer.py -c configs/base.yaml -n sovits5.0 -p chkpt/sovits5.0/***.pth

  • 4, 查看日志

    tensorboard --logdir logs/

  • 5, 启动训练,二阶段训练:heartpulse:

    待完成,二阶段训练内容:PPG叠加噪声,GRL去音色,natural speech推理loss

snac

推理

可以下载release页面的sovits5.0_48k_debug.pth模型,进行推理测试

模型包含56个发音人,在configs/singers目录中,可用于测试音色泄露

4个辨识度较高的发音人样本,在configs/singers_sample目录中

  • 1, 设置工作目录:heartpulse::heartpulse::heartpulse:不设置后面会报错

    export PYTHONPATH=$PWD

  • 2, 导出推理模型:文本编码器,Flow网络,Decoder网络;判别器和后验编码器只在训练中使用

    python svc_export.py --config configs/base.yaml --checkpoint_path chkpt/sovits5.0/***.pt

  • 3, 使用whisper提取内容编码,没有采用一键推理,为了降低显存占用

    python whisper/inference.py -w test.wav -p test.ppg.npy

    生成test.ppg.npy;如果下一步没有指定ppg文件,则调用程序自动生成

  • 4, 提取csv文本格式F0参数,Excel打开csv文件,对照Audition手动修改错误的F0

    python pitch/inference.py -w test.wav -p test.csv

Audition

  • 5,指定参数,推理

    python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./configs/singers/singer0001.npy --wave test.wav --ppg test.ppg.npy --pit test.csv

    当指定--ppg后,多次推理同一个音频时,可以避免重复提取音频内容编码;没有指定,也会自动提取;

    当指定--pit后,可以加载手工调教的F0参数;没有指定,也会自动提取;

    生成文件在当前目录svc_out.wav;

    args name
    --config 配置文件
    --model 模型文件
    --spk 音色文件
    --wave 音频文件
    --ppg 音频内容
    --pit 音高内容

数据集

Name URL
KiSing http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/
PopCS https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md
opencpop https://wenet.org.cn/opencpop/download/
Multi-Singer https://github.com/Multi-Singer/Multi-Singer.github.io
M4Singer https://github.com/M4Singer/M4Singer/blob/master/apply_form.md
CSD https://zenodo.org/record/4785016#.YxqrTbaOMU4
KSS https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset
JVS MuSic https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_music
PJS https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus
JUST Song https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song
MUSDB18 https://sigsep.github.io/datasets/musdb.html#musdb18-compressed-stems
DSD100 https://sigsep.github.io/datasets/dsd100.html
Aishell-3 http://www.aishelltech.com/aishell_3
VCTK https://datashare.ed.ac.uk/handle/10283/2651

代码来源和参考文献

https://github.com/facebookresearch/speech-resynthesis paper

https://github.com/jaywalnut310/vits paper

https://github.com/openai/whisper/ paper

https://github.com/NVIDIA/BigVGAN paper

https://github.com/mindslab-ai/univnet [paper]

https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf

SNAC : Speaker-normalized Affine Coupling Layer in Flow-based Architecture for Zero-Shot Multi-Speaker Text-to-Speech

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

AdaSpeech: Adaptive Text to Speech for Custom Voice

贡献者

so-vits-svc-5.0's People

Contributors

maxmax2016 avatar innnky avatar archivoice avatar stardust-minus avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.