fishaudio / fish-diffusion Goto Github PK

View Code? Open in Web Editor NEW

572.0 22.0 74.0 63.13 MB

An easy to understand TTS / SVS / SVC framework

Home Page: https://diff.fish.audio

License: MIT License

Python 95.63% Dockerfile 0.17% Jupyter Notebook 4.04% Shell 0.15%

diffusion pytorch soundgenerator

fish-diffusion's Introduction

Fish Diffusion

An easy to understand TTS / SVS / SVC training framework.

Check our Wiki to get started!

中文文档

Terms of Use for Fish Diffusion

Obtaining Authorization and Intellectual Property Infringement: The user is solely accountable for acquiring the necessary authorization for any datasets utilized in their training process and assumes full responsibility for any infringement issues arising from the utilization of the input source. Fish Diffusion and its developers disclaim all responsibility for any complications that may emerge due to the utilization of unauthorized datasets.
Proper Attribution: Any derivative works based on Fish Diffusion must explicitly acknowledge the project and its license. In the event of distributing Fish Diffusion's code or disseminating results generated by this project, the user is obliged to cite the original author and source code (Fish Diffusion).
Audiovisual Content and AI-generated Disclosure: All derivative works created using Fish Diffusion, including audio or video materials, must explicitly acknowledge the utilization of the Fish Diffusion project and declare that the content is AI-generated. If incorporating videos or audio published by third parties, the original links must be furnished.
Agreement to Terms: By persisting in the use of Fish Diffusion, the user unequivocally consents to the terms and conditions delineated in this document. Neither Fish Diffusion nor its developers shall be held liable for any subsequent difficulties that may transpire.

Summary

Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows:

Support multi-speaker
The code structure of this repository is simpler and easier to understand, and all modules are decoupled
Support 44.1khz Diff Singer community vocoder
Support multi-machine multi-devices training, support half-precision training, save your training speed and memory

Preparing the environment

The following commands need to be executed in the conda environment of python 3.10

# Install PyTorch related core dependencies, skip if installed
# Reference: https://pytorch.org/get-started/locally/
conda install "pytorch>=2.0.0" "torchvision>=0.15.0" "torchaudio>=2.0.0" pytorch-cuda=11.8 -c pytorch -c nvidia

# Install PDM dependency management tool, skip if installed
# Reference: https://pdm.fming.dev/latest/
curl -sSL https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py | python3 -

# Install the project dependencies
pdm sync

Vocoder preparation

Fish Diffusion requires the FishAudio NSF-HiFiGAN vocoder to generate audio.

Automatic download

python tools/download_nsf_hifigan.py

If you are using the script to download the model, you can use the --agree-license parameter to agree to the CC BY-NC-SA 4.0 license.

python tools/download_nsf_hifigan.py --agree-license

Manual download

Download and unzip nsf_hifigan-stable-v1.zip from Fish Diffusion Release
Copy the nsf_hifigan folder to the checkpoints directory (create if not exist)

If you want to download ContentVec manually, you can download it from here and put it in the checkpoints directory.

Dataset preparation

You only need to put the dataset into the dataset directory in the following file structure

dataset
├───train
│   ├───xxx1-xxx1.wav
│   ├───...
│   ├───Lxx-0xx8.wav
│   └───speaker0 (Subdirectory is also supported)
│       └───xxx1-xxx1.wav
└───valid
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav

# Extract all data features, such as pitch, text features, mel features, etc.
python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean

Baseline training

The project is under active development, please backup your config file
The project is under active development, please backup your config file
The project is under active development, please backup your config file

# Single machine single card / multi-card training
python tools/diffusion/train.py --config configs/svc_hubert_soft.py
# Multi-node training
python tools/diffusion/train.py --config configs/svc_content_vec_multi_node.py
# Environment variables need to be defined on each node,please see https://pytorch-lightning.readthedocs.io/en/1.6.5/clouds/cluster.html  for more information.

# Resume training
python tools/diffusion/train.py --config configs/svc_hubert_soft.py --resume [checkpoint file]

# Fine-tune the pre-trained model
# Note: You should adjust the learning rate scheduler in the config file to warmup_cosine_finetune
python tools/diffusion/train.py --config configs/svc_cn_hubert_soft_finetune.py --pretrained [checkpoint file]

Inference

# Inference using shell, you can use --help to view more parameters
python tools/diffusion/inference.py --config [config] \
    --checkpoint [checkpoint file] \
    --input [input audio] \
    --output [output audio]


# Gradio Web Inference, other parameters will be used as gradio default parameters
python tools/diffusion/inference.py --config [config] \
    --checkpoint [checkpoint file] \
    --gradio

Convert a DiffSVC model to Fish Diffusion

python tools/diffusion/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \
    --input-path [DiffSVC ckpt] \
    --output-path [Fish Diffusion ckpt]

Contributing

If you have any questions, please submit an issue or pull request.
You should run pdm run lint before submitting a pull request.

Real-time documentation can be generated by

pdm run docs

Credits

Thanks to all contributors for their efforts

fish-diffusion's People

Contributors

Stargazers

Watchers

Forkers

lordelf ishine hongwen-sun shaun95 aah stardust-minus lafi2333 chenchy splinter21 mrc2023 sdlibowen qjmxjly justinjohn0306 azosbear gentlem4dman emetzodiac esdrico huanlinoto proxeimity fuoum r2turntrue linfen0 narusemioshirakana mlo7ghinsan nic0nn objectin treksis leekcake lwd-temp alinccc kohakuchan jsionr avcssef toy64bit hufy-dev hayate-hsu astroneko404 kickbub catalystfrank cnchtu 332plim whitefu david15418 entropyriser shikiexe snowleo819 abcliyew scf4 agentasteriski yurzi qimlisky warma16 wanjlee l11ama macguyversmusic techthiyanes vital121 hoeen 5l1v3r1 yinxueyuan122 sunguanyin ydszxc mcoi-dev muruganr96 abersheeran scwunai huangweiboy2 zzc0208 dmcneil jasonzhang761213 zy841336855 cxywzx zhaopufeng clumsyroot

fish-diffusion's Issues

training error in custom datasets

I followed readme.md and prepared custom datasets, I don't see from the readme where the dataset folder should be placed so I guess dataset folder should be placed in root right?

and then I prepared 3 speakers wav in train/vaild folder, modified config files in svc_hubert_soft_multi_speakers.py

`from fish_diffusion.datasets.naive import NaiveSVCDataset

base = [
"./svc_hubert_soft.py",
]

dataset = dict(
train=dict(
delete=True, # Delete the default train dataset
type="NaiveSVCDataset",
datasets=[
dict(
type="NaiveSVCDataset",
path="dataset/train/speaker0",
speaker_id=0,
),
dict(
type="NaiveSVCDataset",
path="dataset/train/speaker1",
speaker_id=1,
),
dict(
type="NaiveSVCDataset",
path="dataset/train/speaker2",
speaker_id=2,
),
],
# Are there any other ways to do this?
collate_fn=NaiveSVCDataset.collate_fn,
),
valid=dict(
type="NaiveSVCDataset",
datasets=[
dict(
type="NaiveSVCDataset",
path="dataset/valid/speaker0",
speaker_id=0,
),
dict(
type="NaiveSVCDataset",
path="dataset/valid/speaker1",
speaker_id=1,
),
dict(
type="NaiveSVCDataset",
path="dataset/valid/speaker2",
speaker_id=2,
)
]
)
)

model = dict(
speaker_encoder=dict(
input_size=3, # 3 speakers
),
)

and then execute command:python tools/diffusion/train.py --config configs/svc_hubert_soft_multi_speakers.py

but i got error:

wandb: Currently logged in as: donkeyddddd. Use wandb login --relogin to force relogin

wandb: WARNING Path logs\wandb\ wasn't writable, using system temp directory.
wandb: WARNING Path logs\wandb\ wasn't writable, using system temp directory
wandb: wandb version 0.15.0 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.13.11
wandb: Run data is saved locally in C:\Users\DONKEY~1\AppData\Local\Temp\wandb\run-20230423_220917-i34fcjr1
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run curious-cherry-8
wandb: View project at https://wandb.ai/donkeyddddd/DiffSVC
wandb: View run at https://wandb.ai/donkeyddddd/DiffSVC/runs/i34fcjr1
Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
File "C:\Users\donkeyddddd\anaconda3\envs\fish_diffusion\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
TypeError: NaiveDataset.init() got an unexpected keyword argument 'datasets'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\python_projects\git_projects\fish-diffusion\tools\diffusion\train.py", line 97, in
train_loader, valid_loader = build_loader_from_config(cfg, trainer.num_devices)
File "D:\python_projects\git_projects\fish-diffusion\fish_diffusion\datasets\utils.py", line 13, in build_loader_from_config
train_dataset = DATASETS.build(cfg.dataset.train)
File "C:\Users\donkeyddddd\anaconda3\envs\fish_diffusion\lib\site-packages\mmengine\registry\registry.py", line 521, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "C:\Users\donkeyddddd\anaconda3\envs\fish_diffusion\lib\site-packages\mmengine\registry\build_functions.py", line 135, in build_from_cfg
raise type(e)(
TypeError: class NaiveSVCDataset in fish_diffusion/datasets/naive.py: NaiveDataset.init() got an unexpected keyword argument 'datasets'
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
wandb: View run curious-cherry-8 at: https://wandb.ai/donkeyddddd/DiffSVC/runs/i34fcjr1
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: C:\Users\DONKEY~1\AppData\Local\Temp\wandb\run-20230423_220917-i34fcjr1\logs

Can you give me some suggestions and ways to solve this problem?

数据集结构

如文档所述，单说话人数据集文件结构如下：

dataset
├───train
│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav
└───valid
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav

train和valid文件夹中音频是否相关？

文档中提到，All the wav files need to be inside the train folder, not in subfolder or otherwise it will fail when preprocessing, unless you are doing a multi-speaker model.，没有关于valid文件夹的信息。

使用预训练模型中断后恢复训练时使用" --resume logs\\\checkpoints\**.ckpt"是否正确？

或者说应该继续使用" --pretrained logs**\checkpoints**.ckpt"？

底模训练问题

如果要自己训练底模

1.数据集是不是最好要做到音域覆盖广，语言类型多，语气语调丰富，男女都要有这类的？

2.训练参数有没有要求（比如batchsize，lr等等
训练是直接将几个开源数据集混合走单人流程训练还是区分speaker走多人流程？

3.评价底模质量是看训练时长，步数，epoch还是看loss值？

主要是想知道diffsvc，sovits这些项目底模的通用训练方法

how to run docker version

docker pull lengyue233/fish-diffusion , pulls and installs well

docker run lengyue233/fish-diffusion , doesn't do anything. any further instructions are appreciated.

fish_diffusion/datasets/naive.py: No files found in dataset/train/lengyue

I'm running the Notebook on Colab.
If I set:

pretrained = True
pretrained_profile = 'hifisinger-v2.1.0'
arch = 'hifisinger'

Everything works fine.
However, If I set:

pretrained = True
pretrained_profile = 'diffusion-v2.0.0'
arch = 'diffusion'

Then if I run the training cell, I get the error below.
For some reason, it's looking for dataset/train/lengyue, and not sure what and why.
My dataset doesn't have any file that has the word "lengyue" in its name.

#raise Exception("Hold...")
Traceback (most recent call last):
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/content/fish-diffusion/fish_diffusion/datasets/naive.py", line 24, in __init__
    assert len(self.paths) > 0, f"No files found in {path}, please check your path."
AssertionError: No files found in dataset/train/lengyue, please check your path.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/content/fish-diffusion/fish_diffusion/datasets/concat.py", line 18, in __init__
    super().__init__([DATASETS.build(dataset) for dataset in datasets])
  File "/content/fish-diffusion/fish_diffusion/datasets/concat.py", line 18, in <listcomp>
    super().__init__([DATASETS.build(dataset) for dataset in datasets])
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/registry.py", line 521, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
    raise type(e)(
AssertionError: class `NaiveSVCDataset` in fish_diffusion/datasets/naive.py: No files found in dataset/train/lengyue, please check your path.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/fish-diffusion/tools/diffusion/train.py", line 97, in <module>
    train_loader, valid_loader = build_loader_from_config(cfg, trainer.num_devices)
  File "/content/fish-diffusion/fish_diffusion/datasets/utils.py", line 13, in build_loader_from_config
    train_dataset = DATASETS.build(cfg.dataset.train)
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/registry.py", line 521, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
    raise type(e)(
AssertionError: class `ConcatDataset` in fish_diffusion/datasets/concat.py: class `NaiveSVCDataset` in 
fish_diffusion/datasets/naive.py: No files found in dataset/train/lengyue, please check your path.

版本依赖很多低于Python3.10，为啥作者要求版本是Python3.10？

ERROR: Ignored the following versions that require a different python version: 0.36.0 Requires-Python >=3.6,<3.10; 0.37.0 Requires-Python >=3.7,<3.10; 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 0.53.0 Requires-Python >=3.6,<3.10; 0.53.0rc1.post1 Requires-Python >=3.6,<3.10; 0.53.0rc2 Requires-Python >=3.6,<3.10; 0.53.0rc3 Requires-Python >=3.6,<3.10; 0.53.1 Requires-Python >=3.6,<3.10; 0.54.0 Requires-Python >=3.7,<3.10; 0.54.0rc2 Requires-Python >=3.7,<3.10; 0.54.0rc3 Requires-Python >=3.7,<3.10; 0.54.1 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement praat-parselmouth==0.5.0 (from versions: 0.1.0, 0.1.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2.post2, 0.3.3, 0.4.0, 0.4.1, 0.4.2, 0.4.3)
ERROR: No matching distribution found for praat-parselmouth==0.5.0

以上是 pip install -r requirements.txt结果。试了v1.12 v2.0.0 以及最新main都一样

RuntimeError: Distributed package doesn't have NCCL built in (On Windows machine)

Any ideas on how to fix this?

colab训练时出现的问题

/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1609: PossibleUserWarning: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 0: 0% 0/6 [00:00<?, ?it/s] /content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/functional.py:641: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
Epoch 18: 0% 0/6 [00:00<?, ?it/s, v_num=0, train_loss_disc_step=3.180, train_loss_gen_step=105.0, train_loss_disc_epoch=3.640, train_loss_gen_epoch=107.0]/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:48: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")

该如何解决呢

Poetry install error

Traceback (most recent call last):
  File "/usr/local/envs/fish/bin/poetry", line 7, in <module>
    from poetry.console import main
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/__init__.py", line 1, in <module>
    from .application import Application
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/application.py", line 7, in <module>
    from .commands.about import AboutCommand
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/commands/__init__.py", line 2, in <module>
    from .add import AddCommand
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/commands/add.py", line 8, in <module>
    from .init import InitCommand
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/commands/init.py", line 16, in <module>
    from poetry.core.pyproject import PyProjectException
ImportError: cannot import name 'PyProjectException' from 'poetry.core.pyproject' (/usr/local/envs/fish/lib/python3.10/site-packages/poetry/core/pyproject/__init__.py)
ERROR conda.cli.main_run:execute(47): `conda run poetry install` failed. (See above for error)

how do i get fish-diffusion collab because can't find it on discord

Fish Diffusion 对 SVS 的支持

请问下，这类项目包括diff-svc 主要做的工作是歌曲音色转换，那diff-singer就是输入歌词+乐曲生成歌曲的工作，你们工程里不包含这一部分是吗

English Guide

Hi, this looks great!

Can you please add a comprehensive tutorial in English? An unlisted yt video tutorial would also be great!

Colab?

Does anyone have a colab notebook for this?

Recent commit breaks poetry install for me

The commit:

e460e38

Seems to cause poetry install to throw an error about groups...

Changing precision=16 to 16-mixed in configs/_base_/trainers/base.py?

When I run the training, it warns about precision=16, and recommends to change to 16-mixed.
When I changed to 16-mixed, it also got rid of other warnings as well:

The warning about how some hyper parameters can't be pickled when saving checkpoint
The warning about how checkpoint was saved before epoch ended when resuming.

Is there a negative side effect of changing precision=16 to 16-mixed in configs/base/trainers/base.py?

请问 2.1 预处理的指令为何？

请问与2.0预处理是通用吗？

python tools/preprocessing/extract_features.py --config configs/svc_content_vec.py --path dataset/train --clean
python tools/preprocessing/extract_features.py --config configs/svc_content_vec.py --path dataset/valid --clean --no-augmentation

Train whisper on multilingual dataset

What will happen? We will use the MFA model to generate a calibrated dataset and be used to train whisper.

Chinese Dataset

Opencpop
M4Singer
OpenSinger
AIShell

English Dataset

LibriSpeech (train-clean-100h)
LJSpeech

Japanese Dataset

JSUT
Kokoro Speech
Namine Ritsu (NNSVS) This dataset is hard to preprocess, but its quality is good....

Dictionaries

US: arpabet
JP: UNCLEAR (need to retrain the MFA model?)
CN: OpenCpop Strict (OpenVPI)

Does training with Multi Speakers improve model quality to generalize better and handle variety inputs?

Does training with Multi Speakers improve model quality to generalize better handle variety of unseen input sources?
Or, it just lets you inference to different speaker?

does this support amd gpus if not how do i use this with google collab

保存模型报错

在更新到commit 3f9bb3e后保存检查点报错

使用的config是svc_content_vec_finetune.py，删除整个项目重新安装依然报错

預處理出現 No module named 'fish_audio_preprocess' 錯誤

不論有沒有使用 CONDA 都出現錯誤
就算使用 https://mirrors.aliyun.com/pypi/simple/fish-audio-preprocess/ 安裝 pip install xxx.whl 也是一樣的情況
不知有沒有辦法解決謝謝!

When trying running Hifisinger inference, this error occures no matter gradio or not

Hello! First of all thank you for continuing develop this technology, it's amazing!

When trying run this line to inference: python .\tools\hifisinger\inference.py --config .\configs\svc_hifisinger.py --checkpoint .\logs\HiFiSVC\zbinryrl\checkpoints\epoch=3124-step=150000-valid_loss=0.83.ckpt

This Error occurs

Having issue with basic install

After going through installation instructions - git pull and pdm sync, any command I try to run I am getting following error " from fish_diffusion.archs.diffsinger.diffsinger import DiffSingerLightning
ModuleNotFoundError: No module named 'fish_diffusion'"

Not sure how to go past this.

example results?

ImportError: DLL load failed while importing _imaging: 找不到指定的模块。

问题复现流程:

按照github上的流程安装依赖,
在不加任何args的情况下,直接用依赖环境中的python运行inference.py.

报错提示:
Traceback (most recent call last): File "d:\AI_drawer\sovits48k\fish-diffusion\inference.py", line 17, in <module> from train import FishDiffusion File "d:\AI_drawer\sovits48k\fish-diffusion\train.py", line 3, in <module> import matplotlib.pyplot as plt File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\matplotlib\__init__.py", line 113, in <module> from . import _api, _version, cbook, _docstring, rcsetup File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\matplotlib\rcsetup.py", line 27, in <module> from matplotlib.colors import Colormap, is_color_like File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\matplotlib\colors.py", line 51, in <module> from PIL import Image File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\PIL\Image.py", line 100, in <module> from . import _imaging as core ImportError: DLL load failed while importing _imaging: 找不到指定的模块。
我希望运行后应该发生什么?

我没有添加任何args,所以他应该报有关args缺少,或是输入音频,模型文件等文件未找到的错误

通过重装pillow已解决

Have confusion about different model arch

If I trained a model by using svc_content_vec.py, can I use the model trained by SVC method for SVS?

If I can't, what's the difference between diffsinger with diffsinger-svc?

What to watch and when to stop for fine tuning?

During the training, it prints out bunch of metrics numbers.
Could you explain in the readme what metric to watch and when to stop?
Thanks!

How to resume training?

I set the --resume-id and it basically started over. Is there something else I need to do? Any help would be appreciated! :)

2.0版本如何套用预训练模型 content-vec-pretrained-v1.ckpt ？

1.12 colab中
python tools/diffusion/train.py --tensorboard --config configs/svc_cn_hubert_soft_finetune.py --pretrained {底模保存的目录}
可以运用 /content/fish-diffusion/checkpoints/content-vec-pretrained-v1.ckpt 此ckpt

但是在2.0 中
python tools/diffusion/train.py --config configs/svc_content_vec.py --pretrained {底模保存的目录}
似乎不行运用 /content/fish-diffusion/checkpoints/content-vec-pretrained-v1.ckpt
测试用 --resume xxx.ckpt 好像也不行
请问如何解决？谢谢！

One Shot Conversion Possible?

Is there a way to use fish-diffusion to train a model for one shot any to any SVC?
You give one source sample and one target samples, and the model extract contents (pitches, contour, syllables) from source and uses voice tone from target sample to generate new sample.
Basically source singing voice needs to be converted as if it was sung by the target singer while keeping the contents unchanged.
It's like Singing Voice Conversion Challenge 2023, but any to any one shot.
http://www.vc-challenge.org/
If not, anyone knows such model?

Tensorboard's Audio panel, with all gt being hoarse voices. Tensorboard的Audio面板，gt全是沙哑的声音

Configuration file used: svc_content_vec_finetune.py
使用的配置文件：svc_content_vec_finetune.py

Pretrained model used: content-vec-pretrained-v1.ckpt
使用的预训练模型：content-vec-pretrained-v1.ckpt

vocoder：Downloaded using download_nsf_hifigan.py
声码器：使用download_nsf_hifigan.py下载的

Command：python tools\diffusion\train.py --config configs\svc_content_vec_finetune.py --pretrained checkpoints\content-vec-pretrained-v1.ckpt --tensorboard

Tensorsize doesn't match, svc_hubert_soft_diff_svc.py

I'm crashing on "svc_hubert_soft_diff_svc.py" with the following traceback, isn't that something similar to #86 ?

| Name | Type | Params

0 | model | DiffSinger | 32.0 M
1 | vocoder | NsfHifiGAN | 14.2 M

32.0 M Trainable params
14.2 M Non-trainable params
46.2 M Total params
184.927 Total estimated model params size (MB)
Sanity Checking DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\User\Documents\Testing\fishdiffusion\tools\diffusion\train.py", line 98, in
trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 935, in _run
results = self._run_stage()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 976, in _run_stage
self._run_sanity_check()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1005, in _run_sanity_check
val_loop.run()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 288, in _call_strategy_hook
output = fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 378, in validation_step
return self.model.validation_step(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 276, in validation_step
return self._step(batch, batch_idx, mode="valid")
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 191, in _step
output = self.model(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 134, in forward
features = self.forward_features(
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 96, in forward_features
features += self.pitch_encoder(pitches)
RuntimeError: The size of tensor a (4) must match the size of tensor b (608) at non-singleton dimension 1

checkpoint is not saved while training the mode in fish-diffusion-2.1.0

/root/anaconda3/envs/fish/lib/python3.10/site-packages/lightning_fabric/plugins/io/torch_io.py:61: UserWarning: Warning, hyper_parameters dropped from checkpoint. An attribute is not picklable: Can't pickle local object 'EvaluationLoop.advance..batch_to_device'
rank_zero_warn(f"Warning, {key} dropped from checkpoint. An attribute is not picklable: {err}")

HiFiSVC训练问题

如题。
https://wandb.ai/stardust-minus/HiFiSVC
在开了5次训练之后发现没有一个收敛成功的，期间尝试了降低lr，缩减speaker等等操作。
正在尝试修改网络参数，请问如何去修改model config来增加参数量？

一些关于增强的问题&bug反馈

1.目前的几个pitch增强相对效果最好的哪个，以及是否可以给一个推荐的参数？
2.今天尝试用之前的config推理的时候发现爆炸了，求问是否有解决办法

配置文件位于 https://paste.ubuntu.com/p/KWMsjKcGHp/
3.在使用多worker预处理时，如果机器上没有whisper模型，在尝试下载时会每个worker下载一遍，有些浪费
期待您的回复

colab 微调训练那裏出錯

预训练模式有跑通,但是下一步微调训练那裏出錯
IsADirectoryError: [Errno 21] Is a directory: '/content/fish-diffusion/checkpoints'

請問該如何解決? 謝謝

Inference issue in colab version

Colab version runs fine through training but when trying to get inference, the following error pops up...Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://0fd4daade2bee2fa34.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/content/fish-diffusion/tools/hifisinger/inference.py", line 224, in
model = HiFiSingerSVCInference(config, args.checkpoint)
File "/content/fish-diffusion/tools/hifisinger/inference.py", line 25, in init
super().init(config, checkpoint, model_cls=model_cls)
File "/content/fish-diffusion/tools/diffusion/inference.py", line 74, in init
self.model = load_checkpoint(
File "/content/fish-diffusion/fish_diffusion/utils/inference.py", line 19, in load_checkpoint
state_dict = torch.load(checkpoint, map_location="cpu")
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Traceback (most recent call last):
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/gradio/routes.py", line 413, in run_predict
with utils.MatplotlibBackendMananger():
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/gradio/utils.py", line 788, in exit
matplotlib.use(self._original_backend)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/matplotlib/init.py", line 1233, in use
plt.switch_backend(name)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/matplotlib/pyplot.py", line 271, in switch_backend
backend_mod = importlib.import_module(
File "/content/env/envs/fish_diffusion/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'ipykernel'

Dealing with Scratchy Output and Octave Shift?

I have two problems that I often encounter, and I wonder if anyone has suggestion.

Output audio with scratchy parts
Some pitches get shifted either one octave lower or higher than the source

I'm attaching a sample output only highlighting the problems.
Thanks!
Scratchy.zip

Big error saving checkpoint

138.519   Total estimated model params size (MB)
Epoch 105:  25%|███████████▊                                   | 5/20 [00:00<00:02,  6.81it/s, loss=0.0487, v_num=owzc]C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\plugins\io\torch_io.py:61: UserWarning: Warning, `hyper_parameters` dropped from checkpoint. An attribute is not picklable: Can't pickle local object 'EvaluationLoop.advance.<locals>.batch_to_device'
  rank_zero_warn(f"Warning, `{key}` dropped from checkpoint. An attribute is not picklable: {err}")
Traceback (most recent call last):
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\plugins\io\torch_io.py", line 54, in save_checkpoint
    _atomic_save(checkpoint, path)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\utilities\cloud_io.py", line 67, in _atomic_save
    torch.save(checkpoint, bytesbuffer)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 653, in _save
    pickler.dump(obj)
AttributeError: Can't pickle local object 'EvaluationLoop.advance.<locals>.batch_to_device'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 668, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\micro\Downloads\fish-diffusion\tools\diffusion\train.py", line 98, in <module>
    trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1112, in _run
    results = self._run_stage()
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1191, in _run_stage
    self._run_train()
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 229, in advance
    self.trainer._call_callback_hooks("on_train_batch_end", batch_end_outputs, batch, batch_idx)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1394, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 296, in on_train_batch_end
    self._save_topk_checkpoint(trainer, monitor_candidates)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 363, in _save_topk_checkpoint
    self._save_none_monitor_checkpoint(trainer, monitor_candidates)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 669, in _save_none_monitor_checkpoint
    self._save_checkpoint(trainer, filepath)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 366, in _save_checkpoint
    trainer.save_checkpoint(filepath, self.save_weights_only)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1939, in save_checkpoint
    self._checkpoint_connector.save_checkpoint(filepath, weights_only=weights_only, storage_options=storage_options)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py", line 511, in save_checkpoint
    self.trainer.strategy.save_checkpoint(_checkpoint, filepath, storage_options=storage_options)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 466, in save_checkpoint
    self.checkpoint_io.save_checkpoint(checkpoint, filepath, storage_options=storage_options)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\plugins\io\torch_io.py", line 62, in save_checkpoint
    _atomic_save(checkpoint, path)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\utilities\cloud_io.py", line 67, in _atomic_save
    torch.save(checkpoint, bytesbuffer)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 440, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 305, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:337] . unexpected pos 645876672 vs 645876560

Help with v2.1 training

While v2.0 works just fine, when switch to v2.1, I encountered error message shown above.
Same error message occurs if de-novo training instead of fine-tuning are used.
Any idea on how to fix that?
Thanks!

Tensor NotImplementedError

Getting this error once I try to start training after a basic install and preparing the data (~400 short wav files).
I'm using a python venv environment instead of conda but installed everything with poetry.

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | DiffSinger | 55.1 M
1 | vocoder | NsfHifiGAN | 14.2 M

55.1 M Trainable params
14.2 M Non-trainable params
69.3 M Total params
277.038 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:430: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 12 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\User\Documents\Testing\fishdiffusion\tools\diffusion\train.py", line 98, in
trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 935, in _run
results = self._run_stage()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 976, in _run_stage
self._run_sanity_check()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1005, in _run_sanity_check
val_loop.run()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 288, in _call_strategy_hook
output = fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 378, in validation_step
return self.model.validation_step(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 276, in validation_step
return self._step(batch, batch_idx, mode="valid")
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 215, in _step
image_mels, wav_reconstruction, wav_prediction = viz_synth_sample(
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\utils\viz.py", line 54, in viz_synth_sample
wav_reconstruction = vocoder.spec2wav(mel_target, pitch)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\modules\vocoders\nsf_hifigan\nsf_hifigan.py", line 81, in spec2wav
y = self.model(c, f0).view(-1)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\modules\vocoders\nsf_hifigan\models.py", line 408, in forward
f0 = F.interpolate(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\functional.py", line 3982, in interpolate
raise NotImplementedError(
NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 2D) for the modes: nearest | linear | bilinear | bicubic | trilinear | area | nearest-exact (got linear)
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.

pitches_editor.py output and inference.py tensor size discrepancy

It doesn't always do this but I used pitches_editor.py to extract pitches. I initially thought it was an error with the online GUI but I loaded the .npy from the "pitches_editor" folder to --pitches_path when inferencing and got the same error.

Traceback (most recent call last):
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\tools\diffusion\inference.py", line 438, in <module>
    model.inference(
  File "C:\Users\Kickbub\.conda\envs\Fish\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\tools\diffusion\inference.py", line 256, in inference
    wav = self(
  File "C:\Users\Kickbub\.conda\envs\Fish\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Kickbub\.conda\envs\Fish\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\tools\diffusion\inference.py", line 99, in forward
    features = self.model.model.forward_features(
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 92, in forward_features
    features += self.pitch_encoder(pitches)
RuntimeError: The size of tensor a (1579) must match the size of tensor b (1580) at non-singleton dimension 1

Weird thing is when I don't use --pitches_path, it works. So it is an issue with pitches_editor.py

Maybe something to do with line 70?

Update Requirements.txt

Am I missing a step I didn't see or something? I keep getting no module named '' errors I did a pip install -r on the requirements.txt but I'm still missing a lot of dependencies

edit:
kept on manually installing the dependencies until I got to here and won't let me progress (I am using release 2.0):

Specify pytorch version for conda

Default pytorch is now 2.0 so we should specify that we want 1.13.1.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

数据预处理问题(bug?)

rgument self in method wrapper_CUDA_Tensor_searchsorted)
  1%|▋                                                                                                  | 45/7110 [00:37<32:55,  3.58it/s]2023-04-05 20:21:32.660 | ERROR    | __main__:safe_process:198 - Error processing dataset/train/opencpop/2082003011.wav
2023-04-05 20:21:32.660 | ERROR    | __main__:safe_process:199 - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument self in method wrapper_CUDA_Tensor_searchsorted)
Traceback (most recent call last):

  File "/mnt/fish-diffusion/tools/preprocessing/extract_features.py", line 245, in <module>
    i = safe_process(args, config, audio_path)
        │            │     │       └ PosixPath('dataset/train/opencpop/2082003011.wav')
        │            │     └ Config (path: configs/svc_hifisinger_finetune.py): {'sampling_rate': 44100, 'hidden_size': 256, 'vocoder_config': {'sampling_...
        │            └ Namespace(config='configs/svc_hifisinger_finetune.py', path='dataset/train', clean=True, num_workers=1, no_augmentation=False)
        └ <function safe_process at 0x7fa72e5d3640>

> File "/mnt/fish-diffusion/tools/preprocessing/extract_features.py", line 166, in safe_process
    process(config, audio_path)
    │       │       └ PosixPath('dataset/train/opencpop/2082003011.wav')
    │       └ Config (path: configs/svc_hifisinger_finetune.py): {'sampling_rate': 44100, 'hidden_size': 256, 'vocoder_config': {'sampling_...
    └ <function process at 0x7fa72e5d3130>

  File "/mnt/fish-diffusion/tools/preprocessing/extract_features.py", line 148, in process
    pitches = pitch_extractor(audio, sr, pad_to=mel_length)
              │               │      │          └ 455
              │               │      └ 44100
              │               └ tensor([[ 0.0017,  0.0017,  0.0018,  ..., -0.0013, -0.0010, -0.0006]],
              │                        device='cuda:0')
              └ <fish_diffusion.modules.pitch_extractors.parsel_mouth.ParselMouthPitchExtractor object at 0x7fa71f22bbb0>

  File "/mnt/fish-diffusion/fish_diffusion/modules/pitch_extractors/parsel_mouth.py", line 42, in __call__
    return self.post_process(x, sampling_rate, f0, pad_to)
           │    │            │  │              │   └ 455
           │    │            │  │              └ array([   0.        ,    0.        ,    0.        ,    0.        ,
           │    │            │  │                          0.        ,    0.        ,  414.08354111,  416.3...
           │    │            │  └ 44100
           │    │            └ tensor([[ 0.0017,  0.0017,  0.0018,  ..., -0.0013, -0.0010, -0.0006]],
           │    │                     device='cuda:0')
           │    └ <function BasePitchExtractor.post_process at 0x7fa72e5d2b90>
           └ <fish_diffusion.modules.pitch_extractors.parsel_mouth.ParselMouthPitchExtractor object at 0x7fa71f22bbb0>

  File "/mnt/fish-diffusion/fish_diffusion/modules/pitch_extractors/builder.py", line 59, in post_process
    return interpolate(time_frame, time_org, f0, left=f0[0], right=f0[-1])
           │           │           │         │        │            └ tensor([ 414.0835,  416.3744,  420.1241,  424.6104,  428.1972,  429.9621,
           │           │           │         │        │                       429.6776,  427.2525,  420.7615,  410.0335,...
           │           │           │         │        └ tensor([ 414.0835,  416.3744,  420.1241,  424.6104,  428.1972,  429.9621,
           │           │           │         │                   429.6776,  427.2525,  420.7615,  410.0335,...
           │           │           │         └ tensor([ 414.0835,  416.3744,  420.1241,  424.6104,  428.1972,  429.9621,
           │           │           │                    429.6776,  427.2525,  420.7615,  410.0335,...
           │           │           └ tensor([0.0697, 0.0813, 0.0929, 0.1045, 0.1161, 0.1277, 0.1393, 0.1509, 0.1625,
           │           │                     0.1741, 0.1858, 0.1974, 0.2090, 0.220...
           │           └ tensor([0.0000, 0.0116, 0.0232, 0.0348, 0.0464, 0.0580, 0.0697, 0.0813, 0.0929,
           │                     0.1045, 0.1161, 0.1277, 0.1393, 0.150...
           └ <function interpolate at 0x7fa731706710>

  File "/mnt/fish-diffusion/fish_diffusion/utils/tensor.py", line 67, in interpolate
    i = torch.clip(torch.searchsorted(xp, x, right=True), 1, len(xp) - 1)
        │     │    │     │            │   │                      └ tensor([0.0697, 0.0813, 0.0929, 0.1045, 0.1161, 0.1277, 0.1393, 0.1509, 0.1625,
        │     │    │     │            │   │                                0.1741, 0.1858, 0.1974, 0.2090, 0.220...
        │     │    │     │            │   └ tensor([0.0000, 0.0116, 0.0232, 0.0348, 0.0464, 0.0580, 0.0697, 0.0813, 0.0929,
        │     │    │     │            │             0.1045, 0.1161, 0.1277, 0.1393, 0.150...
        │     │    │     │            └ tensor([0.0697, 0.0813, 0.0929, 0.1045, 0.1161, 0.1277, 0.1393, 0.1509, 0.1625,
        │     │    │     │                      0.1741, 0.1858, 0.1974, 0.2090, 0.220...
        │     │    │     └ <built-in method searchsorted of type object at 0x7fa79fea4880>
        │     │    └ <module 'torch' from '/opt/miniconda/envs/fish/lib/python3.10/site-packages/torch/__init__.py'>
        │     └ <built-in method clip of type object at 0x7fa79fea4880>
        └ <module 'torch' from '/opt/miniconda/envs/fish/lib/python3.10/site-packages/torch/__init__.py'>

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument self in method wrapper_CUDA_Tensor_searchsorted)

在 CUDA 12.1的机子上抛出了这个错误

Wed Apr  5 20:25:55 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB            Off| 00000000:00:08.0 Off |                    0 |
| N/A   34C    P0               53W / 300W|      0MiB / 32768MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

不知道是什么问题捏

Traceback (most recent call last):
  File "/content/drive/MyDrive/porter-diffusion/train.py", line 217, in <module>
    trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1047, in _run
    self._restore_modules_and_callbacks(ckpt_path)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 991, in _restore_modules_and_callbacks
    self._checkpoint_connector.restore_model()
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 261, in restore_model
    self.trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 363, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"])
  File "/usr/local/envs/fish/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FishDiffusion:
	Missing key(s) in state_dict: "model.speaker_encoder.embedding.weight".

fishaudio / fish-diffusion Goto Github PK

fish-diffusion's Introduction

Fish Diffusion

Terms of Use for Fish Diffusion

Summary

Preparing the environment

Vocoder preparation

Automatic download

Manual download

Dataset preparation

Baseline training

Inference

Convert a DiffSVC model to Fish Diffusion

Contributing

Credits

Thanks to all contributors for their efforts

fish-diffusion's People

Contributors

Stargazers

Watchers

Forkers

fish-diffusion's Issues

This Error occurs

| Name | Type | Params

0 | model | DiffSinger | 32.0 M 1 | vocoder | NsfHifiGAN | 14.2 M

| Name | Type | Params

0 | model | DiffSinger | 55.1 M 1 | vocoder | NsfHifiGAN | 14.2 M

Recommend Projects

Recommend Topics

Recommend Org

0 | model | DiffSinger | 32.0 M
1 | vocoder | NsfHifiGAN | 14.2 M

0 | model | DiffSinger | 55.1 M
1 | vocoder | NsfHifiGAN | 14.2 M