Giter Site home page Giter Site logo

fishaudio / fish-diffusion Goto Github PK

View Code? Open in Web Editor NEW
572.0 22.0 74.0 63.13 MB

An easy to understand TTS / SVS / SVC framework

Home Page: https://diff.fish.audio

License: MIT License

Python 95.63% Dockerfile 0.17% Jupyter Notebook 4.04% Shell 0.15%
diffusion pytorch soundgenerator

fish-diffusion's Introduction

LOGO

Fish Diffusion


An easy to understand TTS / SVS / SVC training framework.

Check our Wiki to get started!

ไธญๆ–‡ๆ–‡ๆกฃ

Terms of Use for Fish Diffusion

  1. Obtaining Authorization and Intellectual Property Infringement: The user is solely accountable for acquiring the necessary authorization for any datasets utilized in their training process and assumes full responsibility for any infringement issues arising from the utilization of the input source. Fish Diffusion and its developers disclaim all responsibility for any complications that may emerge due to the utilization of unauthorized datasets.

  2. Proper Attribution: Any derivative works based on Fish Diffusion must explicitly acknowledge the project and its license. In the event of distributing Fish Diffusion's code or disseminating results generated by this project, the user is obliged to cite the original author and source code (Fish Diffusion).

  3. Audiovisual Content and AI-generated Disclosure: All derivative works created using Fish Diffusion, including audio or video materials, must explicitly acknowledge the utilization of the Fish Diffusion project and declare that the content is AI-generated. If incorporating videos or audio published by third parties, the original links must be furnished.

  4. Agreement to Terms: By persisting in the use of Fish Diffusion, the user unequivocally consents to the terms and conditions delineated in this document. Neither Fish Diffusion nor its developers shall be held liable for any subsequent difficulties that may transpire.

Summary

Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows:

  • Support multi-speaker
  • The code structure of this repository is simpler and easier to understand, and all modules are decoupled
  • Support 44.1khz Diff Singer community vocoder
  • Support multi-machine multi-devices training, support half-precision training, save your training speed and memory

Preparing the environment

The following commands need to be executed in the conda environment of python 3.10

# Install PyTorch related core dependencies, skip if installed
# Reference: https://pytorch.org/get-started/locally/
conda install "pytorch>=2.0.0" "torchvision>=0.15.0" "torchaudio>=2.0.0" pytorch-cuda=11.8 -c pytorch -c nvidia

# Install PDM dependency management tool, skip if installed
# Reference: https://pdm.fming.dev/latest/
curl -sSL https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py | python3 -

# Install the project dependencies
pdm sync

Vocoder preparation

Fish Diffusion requires the FishAudio NSF-HiFiGAN vocoder to generate audio.

Automatic download

python tools/download_nsf_hifigan.py

If you are using the script to download the model, you can use the --agree-license parameter to agree to the CC BY-NC-SA 4.0 license.

python tools/download_nsf_hifigan.py --agree-license

Manual download

Download and unzip nsf_hifigan-stable-v1.zip from Fish Diffusion Release
Copy the nsf_hifigan folder to the checkpoints directory (create if not exist)

If you want to download ContentVec manually, you can download it from here and put it in the checkpoints directory.

Dataset preparation

You only need to put the dataset into the dataset directory in the following file structure

dataset
โ”œโ”€โ”€โ”€train
โ”‚   โ”œโ”€โ”€โ”€xxx1-xxx1.wav
โ”‚   โ”œโ”€โ”€โ”€...
โ”‚   โ”œโ”€โ”€โ”€Lxx-0xx8.wav
โ”‚   โ””โ”€โ”€โ”€speaker0 (Subdirectory is also supported)
โ”‚       โ””โ”€โ”€โ”€xxx1-xxx1.wav
โ””โ”€โ”€โ”€valid
    โ”œโ”€โ”€โ”€xx2-0xxx2.wav
    โ”œโ”€โ”€โ”€...
    โ””โ”€โ”€โ”€xxx7-xxx007.wav
# Extract all data features, such as pitch, text features, mel features, etc.
python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean

Baseline training

The project is under active development, please backup your config file
The project is under active development, please backup your config file
The project is under active development, please backup your config file

# Single machine single card / multi-card training
python tools/diffusion/train.py --config configs/svc_hubert_soft.py
# Multi-node training
python tools/diffusion/train.py --config configs/svc_content_vec_multi_node.py
# Environment variables need to be defined on each node,please see https://pytorch-lightning.readthedocs.io/en/1.6.5/clouds/cluster.html  for more information.

# Resume training
python tools/diffusion/train.py --config configs/svc_hubert_soft.py --resume [checkpoint file]

# Fine-tune the pre-trained model
# Note: You should adjust the learning rate scheduler in the config file to warmup_cosine_finetune
python tools/diffusion/train.py --config configs/svc_cn_hubert_soft_finetune.py --pretrained [checkpoint file]

Inference

# Inference using shell, you can use --help to view more parameters
python tools/diffusion/inference.py --config [config] \
    --checkpoint [checkpoint file] \
    --input [input audio] \
    --output [output audio]


# Gradio Web Inference, other parameters will be used as gradio default parameters
python tools/diffusion/inference.py --config [config] \
    --checkpoint [checkpoint file] \
    --gradio

Convert a DiffSVC model to Fish Diffusion

python tools/diffusion/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \
    --input-path [DiffSVC ckpt] \
    --output-path [Fish Diffusion ckpt]

Contributing

If you have any questions, please submit an issue or pull request.
You should run pdm run lint before submitting a pull request.

Real-time documentation can be generated by

pdm run docs

Credits

Thanks to all contributors for their efforts

fish-diffusion's People

Contributors

abersheeran avatar bfloat16 avatar cnchtu avatar geraint-dou avatar huanlinoto avatar hufy-dev avatar innnky avatar kangarroar avatar kickbub avatar leng-yue avatar longredzhong avatar lordelf avatar mlo7ghinsan avatar pre-commit-ci[bot] avatar ricecakey06 avatar scf4 avatar stardust-minus avatar yurzi avatar zzc0208 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fish-diffusion's Issues

training error in custom datasets

hi

I followed readme.md and prepared custom datasets, I don't see from the readme where the dataset folder should be placed so I guess dataset folder should be placed in root right?

and then I prepared 3 speakers wav in train/vaild folder, modified config files in svc_hubert_soft_multi_speakers.py

`from fish_diffusion.datasets.naive import NaiveSVCDataset

base = [
"./svc_hubert_soft.py",
]

dataset = dict(
train=dict(
delete=True, # Delete the default train dataset
type="NaiveSVCDataset",
datasets=[
dict(
type="NaiveSVCDataset",
path="dataset/train/speaker0",
speaker_id=0,
),
dict(
type="NaiveSVCDataset",
path="dataset/train/speaker1",
speaker_id=1,
),
dict(
type="NaiveSVCDataset",
path="dataset/train/speaker2",
speaker_id=2,
),
],
# Are there any other ways to do this?
collate_fn=NaiveSVCDataset.collate_fn,
),
valid=dict(
type="NaiveSVCDataset",
datasets=[
dict(
type="NaiveSVCDataset",
path="dataset/valid/speaker0",
speaker_id=0,
),
dict(
type="NaiveSVCDataset",
path="dataset/valid/speaker1",
speaker_id=1,
),
dict(
type="NaiveSVCDataset",
path="dataset/valid/speaker2",
speaker_id=2,
)
]
)
)

model = dict(
speaker_encoder=dict(
input_size=3, # 3 speakers
),
)

`

and then execute command:python tools/diffusion/train.py --config configs/svc_hubert_soft_multi_speakers.py

but i got error:

wandb: Currently logged in as: donkeyddddd. Use wandb login --relogin to force relogin

wandb: WARNING Path logs\wandb\ wasn't writable, using system temp directory.
wandb: WARNING Path logs\wandb\ wasn't writable, using system temp directory
wandb: wandb version 0.15.0 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.13.11
wandb: Run data is saved locally in C:\Users\DONKEY~1\AppData\Local\Temp\wandb\run-20230423_220917-i34fcjr1
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run curious-cherry-8
wandb: View project at https://wandb.ai/donkeyddddd/DiffSVC
wandb: View run at https://wandb.ai/donkeyddddd/DiffSVC/runs/i34fcjr1
Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
File "C:\Users\donkeyddddd\anaconda3\envs\fish_diffusion\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
TypeError: NaiveDataset.init() got an unexpected keyword argument 'datasets'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\python_projects\git_projects\fish-diffusion\tools\diffusion\train.py", line 97, in
train_loader, valid_loader = build_loader_from_config(cfg, trainer.num_devices)
File "D:\python_projects\git_projects\fish-diffusion\fish_diffusion\datasets\utils.py", line 13, in build_loader_from_config
train_dataset = DATASETS.build(cfg.dataset.train)
File "C:\Users\donkeyddddd\anaconda3\envs\fish_diffusion\lib\site-packages\mmengine\registry\registry.py", line 521, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "C:\Users\donkeyddddd\anaconda3\envs\fish_diffusion\lib\site-packages\mmengine\registry\build_functions.py", line 135, in build_from_cfg
raise type(e)(
TypeError: class NaiveSVCDataset in fish_diffusion/datasets/naive.py: NaiveDataset.init() got an unexpected keyword argument 'datasets'
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
wandb: View run curious-cherry-8 at: https://wandb.ai/donkeyddddd/DiffSVC/runs/i34fcjr1
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: C:\Users\DONKEY~1\AppData\Local\Temp\wandb\run-20230423_220917-i34fcjr1\logs

Can you give me some suggestions and ways to solve this problem?

ๆ•ฐๆฎ้›†็ป“ๆž„

ๅฆ‚ๆ–‡ๆกฃๆ‰€่ฟฐ๏ผŒๅ•่ฏด่ฏไบบๆ•ฐๆฎ้›†ๆ–‡ไปถ็ป“ๆž„ๅฆ‚ไธ‹๏ผš

dataset
โ”œโ”€โ”€โ”€train
โ”‚   โ”œโ”€โ”€โ”€xxx1-xxx1.wav
โ”‚   โ”œโ”€โ”€โ”€...
โ”‚   โ””โ”€โ”€โ”€Lxx-0xx8.wav
โ””โ”€โ”€โ”€valid
    โ”œโ”€โ”€โ”€xx2-0xxx2.wav
    โ”œโ”€โ”€โ”€...
    โ””โ”€โ”€โ”€xxx7-xxx007.wav

trainๅ’Œvalidๆ–‡ไปถๅคนไธญ้Ÿณ้ข‘ๆ˜ฏๅฆ็›ธๅ…ณ๏ผŸ

ๆ–‡ๆกฃไธญๆๅˆฐ๏ผŒAll the wav files need to be inside the train folder, not in subfolder or otherwise it will fail when preprocessing, unless you are doing a multi-speaker model.๏ผŒๆฒกๆœ‰ๅ…ณไบŽvalidๆ–‡ไปถๅคน็š„ไฟกๆฏใ€‚

ๅบ•ๆจก่ฎญ็ปƒ้—ฎ้ข˜

ๅฆ‚ๆžœ่ฆ่‡ชๅทฑ่ฎญ็ปƒๅบ•ๆจก

1.ๆ•ฐๆฎ้›†ๆ˜ฏไธๆ˜ฏๆœ€ๅฅฝ่ฆๅšๅˆฐ้ŸณๅŸŸ่ฆ†็›–ๅนฟ๏ผŒ่ฏญ่จ€็ฑปๅž‹ๅคš๏ผŒ่ฏญๆฐ”่ฏญ่ฐƒไธฐๅฏŒ๏ผŒ็”ทๅฅณ้ƒฝ่ฆๆœ‰่ฟ™็ฑป็š„๏ผŸ

2.่ฎญ็ปƒๅ‚ๆ•ฐๆœ‰ๆฒกๆœ‰่ฆๆฑ‚๏ผˆๆฏ”ๅฆ‚batchsize๏ผŒlr็ญ‰็ญ‰
่ฎญ็ปƒๆ˜ฏ็›ดๆŽฅๅฐ†ๅ‡ ไธชๅผ€ๆบๆ•ฐๆฎ้›†ๆททๅˆ่ตฐๅ•ไบบๆต็จ‹่ฎญ็ปƒ่ฟ˜ๆ˜ฏๅŒบๅˆ†speaker่ตฐๅคšไบบๆต็จ‹๏ผŸ

3.่ฏ„ไปทๅบ•ๆจก่ดจ้‡ๆ˜ฏ็œ‹่ฎญ็ปƒๆ—ถ้•ฟ๏ผŒๆญฅๆ•ฐ๏ผŒepoch่ฟ˜ๆ˜ฏ็œ‹lossๅ€ผ๏ผŸ

ไธป่ฆๆ˜ฏๆƒณ็Ÿฅ้“diffsvc๏ผŒsovits่ฟ™ไบ›้กน็›ฎๅบ•ๆจก็š„้€š็”จ่ฎญ็ปƒๆ–นๆณ•

how to run docker version

docker pull lengyue233/fish-diffusion , pulls and installs well

docker run lengyue233/fish-diffusion , doesn't do anything. any further instructions are appreciated.

fish_diffusion/datasets/naive.py: No files found in dataset/train/lengyue

I'm running the Notebook on Colab.
If I set:

pretrained = True
pretrained_profile = 'hifisinger-v2.1.0'
arch = 'hifisinger'

Everything works fine.
However, If I set:

pretrained = True
pretrained_profile = 'diffusion-v2.0.0'
arch = 'diffusion'

Then if I run the training cell, I get the error below.
For some reason, it's looking for dataset/train/lengyue, and not sure what and why.
My dataset doesn't have any file that has the word "lengyue" in its name.

#raise Exception("Hold...")
Traceback (most recent call last):
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/content/fish-diffusion/fish_diffusion/datasets/naive.py", line 24, in __init__
    assert len(self.paths) > 0, f"No files found in {path}, please check your path."
AssertionError: No files found in dataset/train/lengyue, please check your path.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/content/fish-diffusion/fish_diffusion/datasets/concat.py", line 18, in __init__
    super().__init__([DATASETS.build(dataset) for dataset in datasets])
  File "/content/fish-diffusion/fish_diffusion/datasets/concat.py", line 18, in <listcomp>
    super().__init__([DATASETS.build(dataset) for dataset in datasets])
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/registry.py", line 521, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
    raise type(e)(
AssertionError: class `NaiveSVCDataset` in fish_diffusion/datasets/naive.py: No files found in dataset/train/lengyue, please check your path.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/fish-diffusion/tools/diffusion/train.py", line 97, in <module>
    train_loader, valid_loader = build_loader_from_config(cfg, trainer.num_devices)
  File "/content/fish-diffusion/fish_diffusion/datasets/utils.py", line 13, in build_loader_from_config
    train_dataset = DATASETS.build(cfg.dataset.train)
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/registry.py", line 521, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
    raise type(e)(
AssertionError: class `ConcatDataset` in fish_diffusion/datasets/concat.py: class `NaiveSVCDataset` in 
fish_diffusion/datasets/naive.py: No files found in dataset/train/lengyue, please check your path.

็‰ˆๆœฌไพ่ต–ๅพˆๅคšไฝŽไบŽPython3.10๏ผŒไธบๅ•ฅไฝœ่€…่ฆๆฑ‚็‰ˆๆœฌๆ˜ฏPython3.10๏ผŸ

ERROR: Ignored the following versions that require a different python version: 0.36.0 Requires-Python >=3.6,<3.10; 0.37.0 Requires-Python >=3.7,<3.10; 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 0.53.0 Requires-Python >=3.6,<3.10; 0.53.0rc1.post1 Requires-Python >=3.6,<3.10; 0.53.0rc2 Requires-Python >=3.6,<3.10; 0.53.0rc3 Requires-Python >=3.6,<3.10; 0.53.1 Requires-Python >=3.6,<3.10; 0.54.0 Requires-Python >=3.7,<3.10; 0.54.0rc2 Requires-Python >=3.7,<3.10; 0.54.0rc3 Requires-Python >=3.7,<3.10; 0.54.1 Requires-Python >=3.7,<3.10
ERROR: Could not find a version that satisfies the requirement praat-parselmouth==0.5.0 (from versions: 0.1.0, 0.1.1, 0.2.0, 0.2.1, 0.3.0, 0.3.1, 0.3.2.post2, 0.3.3, 0.4.0, 0.4.1, 0.4.2, 0.4.3)
ERROR: No matching distribution found for praat-parselmouth==0.5.0

ไปฅไธŠๆ˜ฏ pip install -r requirements.txt็ป“ๆžœใ€‚่ฏ•ไบ†v1.12 v2.0.0 ไปฅๅŠๆœ€ๆ–ฐmain้ƒฝไธ€ๆ ท

colab่ฎญ็ปƒๆ—ถๅ‡บ็Žฐ็š„้—ฎ้ข˜

/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1609: PossibleUserWarning: The number of training batches (6) is smaller than the logging interval Trainer(log_every_n_steps=10). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
rank_zero_warn(
Epoch 0: 0% 0/6 [00:00<?, ?it/s] /content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/functional.py:641: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at ../aten/src/ATen/EmptyTensor.cpp:31.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
Epoch 18: 0% 0/6 [00:00<?, ?it/s, v_num=0, train_loss_disc_step=3.180, train_loss_gen_step=105.0, train_loss_disc_epoch=3.640, train_loss_gen_epoch=107.0]/content/env/envs/fish_diffusion/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:48: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")

่ฏฅๅฆ‚ไฝ•่งฃๅ†ณๅ‘ข

Poetry install error

Traceback (most recent call last):
  File "/usr/local/envs/fish/bin/poetry", line 7, in <module>
    from poetry.console import main
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/__init__.py", line 1, in <module>
    from .application import Application
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/application.py", line 7, in <module>
    from .commands.about import AboutCommand
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/commands/__init__.py", line 2, in <module>
    from .add import AddCommand
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/commands/add.py", line 8, in <module>
    from .init import InitCommand
  File "/usr/local/envs/fish/lib/python3.10/site-packages/poetry/console/commands/init.py", line 16, in <module>
    from poetry.core.pyproject import PyProjectException
ImportError: cannot import name 'PyProjectException' from 'poetry.core.pyproject' (/usr/local/envs/fish/lib/python3.10/site-packages/poetry/core/pyproject/__init__.py)
ERROR conda.cli.main_run:execute(47): `conda run poetry install` failed. (See above for error)

Fish Diffusion ๅฏน SVS ็š„ๆ”ฏๆŒ

่ฏท้—ฎไธ‹๏ผŒ่ฟ™็ฑป้กน็›ฎๅŒ…ๆ‹ฌdiff-svc ไธป่ฆๅš็š„ๅทฅไฝœๆ˜ฏๆญŒๆ›ฒ้Ÿณ่‰ฒ่ฝฌๆข๏ผŒ้‚ฃdiff-singerๅฐฑๆ˜ฏ่พ“ๅ…ฅๆญŒ่ฏ+ไนๆ›ฒ็”ŸๆˆๆญŒๆ›ฒ็š„ๅทฅไฝœ๏ผŒไฝ ไปฌๅทฅ็จ‹้‡ŒไธๅŒ…ๅซ่ฟ™ไธ€้ƒจๅˆ†ๆ˜ฏๅ—

English Guide

Hi, this looks great!

Can you please add a comprehensive tutorial in English? An unlisted yt video tutorial would also be great!

Colab?

Does anyone have a colab notebook for this?

Changing precision=16 to 16-mixed in configs/_base_/trainers/base.py?

When I run the training, it warns about precision=16, and recommends to change to 16-mixed.
When I changed to 16-mixed, it also got rid of other warnings as well:

  1. The warning about how some hyper parameters can't be pickled when saving checkpoint
  2. The warning about how checkpoint was saved before epoch ended when resuming.

Is there a negative side effect of changing precision=16 to 16-mixed in configs/base/trainers/base.py?

Train whisper on multilingual dataset

What will happen? We will use the MFA model to generate a calibrated dataset and be used to train whisper.

Chinese Dataset

  • Opencpop
  • M4Singer
  • OpenSinger
  • AIShell

English Dataset

  • LibriSpeech (train-clean-100h)
  • LJSpeech

Japanese Dataset

Dictionaries

  • US: arpabet
  • JP: UNCLEAR (need to retrain the MFA model?)
  • CN: OpenCpop Strict (OpenVPI)

ไฟๅญ˜ๆจกๅž‹ๆŠฅ้”™

ๅœจๆ›ดๆ–ฐๅˆฐcommit 3f9bb3eๅŽไฟๅญ˜ๆฃ€ๆŸฅ็‚นๆŠฅ้”™

ไฝฟ็”จ็š„configๆ˜ฏsvc_content_vec_finetune.py๏ผŒๅˆ ้™คๆ•ดไธช้กน็›ฎ้‡ๆ–ฐๅฎ‰่ฃ…ไพ็„ถๆŠฅ้”™
Screenshot (667)

Having issue with basic install

After going through installation instructions - git pull and pdm sync, any command I try to run I am getting following error " from fish_diffusion.archs.diffsinger.diffsinger import DiffSingerLightning
ModuleNotFoundError: No module named 'fish_diffusion'"

Not sure how to go past this.

ImportError: DLL load failed while importing _imaging: ๆ‰พไธๅˆฐๆŒ‡ๅฎš็š„ๆจกๅ—ใ€‚

้—ฎ้ข˜ๅค็Žฐๆต็จ‹:

  1. ๆŒ‰็…งgithubไธŠ็š„ๆต็จ‹ๅฎ‰่ฃ…ไพ่ต–,
  2. ๅœจไธๅŠ ไปปไฝ•args็š„ๆƒ…ๅ†ตไธ‹,็›ดๆŽฅ็”จไพ่ต–็Žฏๅขƒไธญ็š„python่ฟ่กŒinference.py.

ๆŠฅ้”™ๆ็คบ:
Traceback (most recent call last): File "d:\AI_drawer\sovits48k\fish-diffusion\inference.py", line 17, in <module> from train import FishDiffusion File "d:\AI_drawer\sovits48k\fish-diffusion\train.py", line 3, in <module> import matplotlib.pyplot as plt File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\matplotlib\__init__.py", line 113, in <module> from . import _api, _version, cbook, _docstring, rcsetup File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\matplotlib\rcsetup.py", line 27, in <module> from matplotlib.colors import Colormap, is_color_like File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\matplotlib\colors.py", line 51, in <module> from PIL import Image File "D:\AI_drawer\sovits48k\fish-diffusion\env310\lib\site-packages\PIL\Image.py", line 100, in <module> from . import _imaging as core ImportError: DLL load failed while importing _imaging: ๆ‰พไธๅˆฐๆŒ‡ๅฎš็š„ๆจกๅ—ใ€‚
ๆˆ‘ๅธŒๆœ›่ฟ่กŒๅŽๅบ”่ฏฅๅ‘็”Ÿไป€ไนˆ?

ๆˆ‘ๆฒกๆœ‰ๆทปๅŠ ไปปไฝ•args,ๆ‰€ไปฅไป–ๅบ”่ฏฅๆŠฅๆœ‰ๅ…ณargs็ผบๅฐ‘,ๆˆ–ๆ˜ฏ่พ“ๅ…ฅ้Ÿณ้ข‘,ๆจกๅž‹ๆ–‡ไปถ็ญ‰ๆ–‡ไปถๆœชๆ‰พๅˆฐ็š„้”™่ฏฏ

้€š่ฟ‡้‡่ฃ…pillowๅทฒ่งฃๅ†ณ

Have confusion about different model arch

If I trained a model by using svc_content_vec.py, can I use the model trained by SVC method for SVS?

If I can't, what's the difference between diffsinger with diffsinger-svc?

How to resume training?

I set the --resume-id and it basically started over. Is there something else I need to do? Any help would be appreciated! :)

2.0็‰ˆๆœฌๅฆ‚ไฝ•ๅฅ—็”จ้ข„่ฎญ็ปƒๆจกๅž‹ content-vec-pretrained-v1.ckpt ๏ผŸ

1.12 colabไธญ
python tools/diffusion/train.py --tensorboard --config configs/svc_cn_hubert_soft_finetune.py --pretrained {ๅบ•ๆจกไฟๅญ˜็š„็›ฎๅฝ•}
ๅฏไปฅ่ฟ็”จ /content/fish-diffusion/checkpoints/content-vec-pretrained-v1.ckpt ๆญคckpt

ไฝ†ๆ˜ฏๅœจ2.0 ไธญ
python tools/diffusion/train.py --config configs/svc_content_vec.py --pretrained {ๅบ•ๆจกไฟๅญ˜็š„็›ฎๅฝ•}
ไผผไนŽไธ่กŒ่ฟ็”จ /content/fish-diffusion/checkpoints/content-vec-pretrained-v1.ckpt
ๆต‹่ฏ•็”จ --resume xxx.ckpt ๅฅฝๅƒไนŸไธ่กŒ
่ฏท้—ฎๅฆ‚ไฝ•่งฃๅ†ณ๏ผŸ่ฐข่ฐข๏ผ

One Shot Conversion Possible?

Is there a way to use fish-diffusion to train a model for one shot any to any SVC?
You give one source sample and one target samples, and the model extract contents (pitches, contour, syllables) from source and uses voice tone from target sample to generate new sample.
Basically source singing voice needs to be converted as if it was sung by the target singer while keeping the contents unchanged.
It's like Singing Voice Conversion Challenge 2023, but any to any one shot.
http://www.vc-challenge.org/
If not, anyone knows such model?

Tensorboard's Audio panel, with all gt being hoarse voices. Tensorboard็š„Audio้ขๆฟ๏ผŒgtๅ…จๆ˜ฏๆฒ™ๅ“‘็š„ๅฃฐ้Ÿณ

Configuration file used: svc_content_vec_finetune.py
ไฝฟ็”จ็š„้…็ฝฎๆ–‡ไปถ๏ผšsvc_content_vec_finetune.py

Pretrained model used: content-vec-pretrained-v1.ckpt
ไฝฟ็”จ็š„้ข„่ฎญ็ปƒๆจกๅž‹๏ผšcontent-vec-pretrained-v1.ckpt

vocoder๏ผšDownloaded using download_nsf_hifigan.py
ๅฃฐ็ ๅ™จ๏ผšไฝฟ็”จdownload_nsf_hifigan.pyไธ‹่ฝฝ็š„

Command๏ผšpython tools\diffusion\train.py --config configs\svc_content_vec_finetune.py --pretrained checkpoints\content-vec-pretrained-v1.ckpt --tensorboard

Tensorsize doesn't match, svc_hubert_soft_diff_svc.py

I'm crashing on "svc_hubert_soft_diff_svc.py" with the following traceback, isn't that something similar to #86 ?

| Name | Type | Params

0 | model | DiffSinger | 32.0 M
1 | vocoder | NsfHifiGAN | 14.2 M

32.0 M Trainable params
14.2 M Non-trainable params
46.2 M Total params
184.927 Total estimated model params size (MB)
Sanity Checking DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\User\Documents\Testing\fishdiffusion\tools\diffusion\train.py", line 98, in
trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 935, in _run
results = self._run_stage()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 976, in _run_stage
self._run_sanity_check()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1005, in _run_sanity_check
val_loop.run()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 288, in _call_strategy_hook
output = fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 378, in validation_step
return self.model.validation_step(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 276, in validation_step
return self._step(batch, batch_idx, mode="valid")
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 191, in _step
output = self.model(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 134, in forward
features = self.forward_features(
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 96, in forward_features
features += self.pitch_encoder(pitches)
RuntimeError: The size of tensor a (4) must match the size of tensor b (608) at non-singleton dimension 1

checkpoint is not saved while training the mode in fish-diffusion-2.1.0

/root/anaconda3/envs/fish/lib/python3.10/site-packages/lightning_fabric/plugins/io/torch_io.py:61: UserWarning: Warning, hyper_parameters dropped from checkpoint. An attribute is not picklable: Can't pickle local object 'EvaluationLoop.advance..batch_to_device'
rank_zero_warn(f"Warning, {key} dropped from checkpoint. An attribute is not picklable: {err}")

HiFiSVC่ฎญ็ปƒ้—ฎ้ข˜

ๅฆ‚้ข˜ใ€‚
https://wandb.ai/stardust-minus/HiFiSVC
ๅœจๅผ€ไบ†5ๆฌก่ฎญ็ปƒไน‹ๅŽๅ‘็Žฐๆฒกๆœ‰ไธ€ไธชๆ”ถๆ•›ๆˆๅŠŸ็š„๏ผŒๆœŸ้—ดๅฐ่ฏ•ไบ†้™ไฝŽlr๏ผŒ็ผฉๅ‡speaker็ญ‰็ญ‰ๆ“ไฝœใ€‚
ๆญฃๅœจๅฐ่ฏ•ไฟฎๆ”น็ฝ‘็ปœๅ‚ๆ•ฐ๏ผŒ่ฏท้—ฎๅฆ‚ไฝ•ๅŽปไฟฎๆ”นmodel configๆฅๅขžๅŠ ๅ‚ๆ•ฐ้‡๏ผŸ

ไธ€ไบ›ๅ…ณไบŽๅขžๅผบ็š„้—ฎ้ข˜&bugๅ้ฆˆ

1.็›ฎๅ‰็š„ๅ‡ ไธชpitchๅขžๅผบ ็›ธๅฏนๆ•ˆๆžœๆœ€ๅฅฝ็š„ๅ“ชไธช๏ผŒไปฅๅŠๆ˜ฏๅฆๅฏไปฅ็ป™ไธ€ไธชๆŽจ่็š„ๅ‚ๆ•ฐ๏ผŸ
2.ไปŠๅคฉๅฐ่ฏ•็”จไน‹ๅ‰็š„configๆŽจ็†็š„ๆ—ถๅ€™ๅ‘็Žฐ็ˆ†็‚ธไบ†๏ผŒๆฑ‚้—ฎๆ˜ฏๅฆๆœ‰่งฃๅ†ณๅŠžๆณ•
image
้…็ฝฎๆ–‡ไปถไฝไบŽ https://paste.ubuntu.com/p/KWMsjKcGHp/
3.ๅœจไฝฟ็”จๅคšworker้ข„ๅค„็†ๆ—ถ๏ผŒๅฆ‚ๆžœๆœบๅ™จไธŠๆฒกๆœ‰whisperๆจกๅž‹๏ผŒๅœจๅฐ่ฏ•ไธ‹่ฝฝๆ—ถไผšๆฏไธชworkerไธ‹่ฝฝไธ€้๏ผŒๆœ‰ไบ›ๆตช่ดน
ๆœŸๅพ…ๆ‚จ็š„ๅ›žๅค

colab ๅพฎ่ฐƒ่ฎญ็ปƒ้‚ฃ่ฃๅ‡บ้Œฏ

87
้ข„่ฎญ็ปƒๆจกๅผๆœ‰่ท‘้€š,ไฝ†ๆ˜ฏไธ‹ไธ€ๆญฅ ๅพฎ่ฐƒ่ฎญ็ปƒ้‚ฃ่ฃๅ‡บ้Œฏ
IsADirectoryError: [Errno 21] Is a directory: '/content/fish-diffusion/checkpoints'

88

่ซ‹ๅ•่ฉฒๅฆ‚ไฝ•่งฃๆฑบ? ่ฌ่ฌ

Inference issue in colab version

Colab version runs fine through training but when trying to get inference, the following error pops up...Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://0fd4daade2bee2fa34.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Traceback (most recent call last):
File "/content/fish-diffusion/tools/hifisinger/inference.py", line 224, in
model = HiFiSingerSVCInference(config, args.checkpoint)
File "/content/fish-diffusion/tools/hifisinger/inference.py", line 25, in init
super().init(config, checkpoint, model_cls=model_cls)
File "/content/fish-diffusion/tools/diffusion/inference.py", line 74, in init
self.model = load_checkpoint(
File "/content/fish-diffusion/fish_diffusion/utils/inference.py", line 19, in load_checkpoint
state_dict = torch.load(checkpoint, map_location="cpu")
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/torch/serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Traceback (most recent call last):
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/gradio/routes.py", line 413, in run_predict
with utils.MatplotlibBackendMananger():
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/gradio/utils.py", line 788, in exit
matplotlib.use(self._original_backend)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/matplotlib/init.py", line 1233, in use
plt.switch_backend(name)
File "/content/env/envs/fish_diffusion/lib/python3.10/site-packages/matplotlib/pyplot.py", line 271, in switch_backend
backend_mod = importlib.import_module(
File "/content/env/envs/fish_diffusion/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'ipykernel'

Dealing with Scratchy Output and Octave Shift?

I have two problems that I often encounter, and I wonder if anyone has suggestion.

  1. Output audio with scratchy parts
  2. Some pitches get shifted either one octave lower or higher than the source

I'm attaching a sample output only highlighting the problems.
Thanks!
Scratchy.zip

Big error saving checkpoint

138.519   Total estimated model params size (MB)
Epoch 105:  25%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Š                                   | 5/20 [00:00<00:02,  6.81it/s, loss=0.0487, v_num=owzc]C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\plugins\io\torch_io.py:61: UserWarning: Warning, `hyper_parameters` dropped from checkpoint. An attribute is not picklable: Can't pickle local object 'EvaluationLoop.advance.<locals>.batch_to_device'
  rank_zero_warn(f"Warning, `{key}` dropped from checkpoint. An attribute is not picklable: {err}")
Traceback (most recent call last):
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\plugins\io\torch_io.py", line 54, in save_checkpoint
    _atomic_save(checkpoint, path)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\utilities\cloud_io.py", line 67, in _atomic_save
    torch.save(checkpoint, bytesbuffer)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 653, in _save
    pickler.dump(obj)
AttributeError: Can't pickle local object 'EvaluationLoop.advance.<locals>.batch_to_device'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 441, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 668, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\micro\Downloads\fish-diffusion\tools\diffusion\train.py", line 98, in <module>
    trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1112, in _run
    results = self._run_stage()
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1191, in _run_stage
    self._run_train()
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 229, in advance
    self.trainer._call_callback_hooks("on_train_batch_end", batch_end_outputs, batch, batch_idx)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1394, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 296, in on_train_batch_end
    self._save_topk_checkpoint(trainer, monitor_candidates)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 363, in _save_topk_checkpoint
    self._save_none_monitor_checkpoint(trainer, monitor_candidates)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 669, in _save_none_monitor_checkpoint
    self._save_checkpoint(trainer, filepath)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\callbacks\model_checkpoint.py", line 366, in _save_checkpoint
    trainer.save_checkpoint(filepath, self.save_weights_only)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1939, in save_checkpoint
    self._checkpoint_connector.save_checkpoint(filepath, weights_only=weights_only, storage_options=storage_options)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\trainer\connectors\checkpoint_connector.py", line 511, in save_checkpoint
    self.trainer.strategy.save_checkpoint(_checkpoint, filepath, storage_options=storage_options)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 466, in save_checkpoint
    self.checkpoint_io.save_checkpoint(checkpoint, filepath, storage_options=storage_options)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\plugins\io\torch_io.py", line 62, in save_checkpoint
    _atomic_save(checkpoint, path)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\lightning_fabric\utilities\cloud_io.py", line 67, in _atomic_save
    torch.save(checkpoint, bytesbuffer)
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 440, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "C:\Users\micro\miniconda3\envs\fish\lib\site-packages\torch\serialization.py", line 305, in __exit__
    self.file_like.write_end_of_file()
RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:337] . unexpected pos 645876672 vs 645876560

Help with v2.1 training

image
While v2.0 works just fine, when switch to v2.1, I encountered error message shown above.
Same error message occurs if de-novo training instead of fine-tuning are used.
Any idea on how to fix that?
Thanks!

Tensor NotImplementedError

Getting this error once I try to start training after a basic install and preparing the data (~400 short wav files).
I'm using a python venv environment instead of conda but installed everything with poetry.

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | DiffSinger | 55.1 M
1 | vocoder | NsfHifiGAN | 14.2 M

55.1 M Trainable params
14.2 M Non-trainable params
69.3 M Total params
277.038 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:430: PossibleUserWarning: The dataloader, val_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 12 which is the number of cpus on this machine) in theDataLoader` init to improve performance.
rank_zero_warn(
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\User\Documents\Testing\fishdiffusion\tools\diffusion\train.py", line 98, in
trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 520, in fit
call._call_and_handle_interrupt(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 559, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 935, in _run
results = self._run_stage()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 976, in _run_stage
self._run_sanity_check()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1005, in _run_sanity_check
val_loop.run()
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\loops\evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 288, in _call_strategy_hook
output = fn(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 378, in validation_step
return self.model.validation_step(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 276, in validation_step
return self._step(batch, batch_idx, mode="valid")
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 215, in _step
image_mels, wav_reconstruction, wav_prediction = viz_synth_sample(
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\utils\viz.py", line 54, in viz_synth_sample
wav_reconstruction = vocoder.spec2wav(mel_target, pitch)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\modules\vocoders\nsf_hifigan\nsf_hifigan.py", line 81, in spec2wav
y = self.model(c, f0).view(-1)
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\User\Documents\Testing\fishdiffusion\fish_diffusion\modules\vocoders\nsf_hifigan\models.py", line 408, in forward
f0 = F.interpolate(
File "C:\Users\User\Documents\Testing\fishdiffusion\venv\lib\site-packages\torch\nn\functional.py", line 3982, in interpolate
raise NotImplementedError(
NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 2D) for the modes: nearest | linear | bilinear | bicubic | trilinear | area | nearest-exact (got linear)
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.

pitches_editor.py output and inference.py tensor size discrepancy

It doesn't always do this but I used pitches_editor.py to extract pitches. I initially thought it was an error with the online GUI but I loaded the .npy from the "pitches_editor" folder to --pitches_path when inferencing and got the same error.

Traceback (most recent call last):
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\tools\diffusion\inference.py", line 438, in <module>
    model.inference(
  File "C:\Users\Kickbub\.conda\envs\Fish\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\tools\diffusion\inference.py", line 256, in inference
    wav = self(
  File "C:\Users\Kickbub\.conda\envs\Fish\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Kickbub\.conda\envs\Fish\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\tools\diffusion\inference.py", line 99, in forward
    features = self.model.model.forward_features(
  File "C:\Users\Kickbub\Documents\automatic1111\fish-diffusion\fish_diffusion\archs\diffsinger\diffsinger.py", line 92, in forward_features
    features += self.pitch_encoder(pitches)
RuntimeError: The size of tensor a (1579) must match the size of tensor b (1580) at non-singleton dimension 1

Weird thing is when I don't use --pitches_path, it works. So it is an issue with pitches_editor.py

Maybe something to do with line 70?

Update Requirements.txt

Am I missing a step I didn't see or something? I keep getting no module named '' errors I did a pip install -r on the requirements.txt but I'm still missing a lot of dependencies
image

edit:
kept on manually installing the dependencies until I got to here and won't let me progress (I am using release 2.0):
image

Specify pytorch version for conda

Default pytorch is now 2.0 so we should specify that we want 1.13.1.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

ๆ•ฐๆฎ้ข„ๅค„็†้—ฎ้ข˜(bug?)

rgument self in method wrapper_CUDA_Tensor_searchsorted)
  1%|โ–‹                                                                                                  | 45/7110 [00:37<32:55,  3.58it/s]2023-04-05 20:21:32.660 | ERROR    | __main__:safe_process:198 - Error processing dataset/train/opencpop/2082003011.wav
2023-04-05 20:21:32.660 | ERROR    | __main__:safe_process:199 - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument self in method wrapper_CUDA_Tensor_searchsorted)
Traceback (most recent call last):

  File "/mnt/fish-diffusion/tools/preprocessing/extract_features.py", line 245, in <module>
    i = safe_process(args, config, audio_path)
        โ”‚            โ”‚     โ”‚       โ”” PosixPath('dataset/train/opencpop/2082003011.wav')
        โ”‚            โ”‚     โ”” Config (path: configs/svc_hifisinger_finetune.py): {'sampling_rate': 44100, 'hidden_size': 256, 'vocoder_config': {'sampling_...
        โ”‚            โ”” Namespace(config='configs/svc_hifisinger_finetune.py', path='dataset/train', clean=True, num_workers=1, no_augmentation=False)
        โ”” <function safe_process at 0x7fa72e5d3640>

> File "/mnt/fish-diffusion/tools/preprocessing/extract_features.py", line 166, in safe_process
    process(config, audio_path)
    โ”‚       โ”‚       โ”” PosixPath('dataset/train/opencpop/2082003011.wav')
    โ”‚       โ”” Config (path: configs/svc_hifisinger_finetune.py): {'sampling_rate': 44100, 'hidden_size': 256, 'vocoder_config': {'sampling_...
    โ”” <function process at 0x7fa72e5d3130>

  File "/mnt/fish-diffusion/tools/preprocessing/extract_features.py", line 148, in process
    pitches = pitch_extractor(audio, sr, pad_to=mel_length)
              โ”‚               โ”‚      โ”‚          โ”” 455
              โ”‚               โ”‚      โ”” 44100
              โ”‚               โ”” tensor([[ 0.0017,  0.0017,  0.0018,  ..., -0.0013, -0.0010, -0.0006]],
              โ”‚                        device='cuda:0')
              โ”” <fish_diffusion.modules.pitch_extractors.parsel_mouth.ParselMouthPitchExtractor object at 0x7fa71f22bbb0>

  File "/mnt/fish-diffusion/fish_diffusion/modules/pitch_extractors/parsel_mouth.py", line 42, in __call__
    return self.post_process(x, sampling_rate, f0, pad_to)
           โ”‚    โ”‚            โ”‚  โ”‚              โ”‚   โ”” 455
           โ”‚    โ”‚            โ”‚  โ”‚              โ”” array([   0.        ,    0.        ,    0.        ,    0.        ,
           โ”‚    โ”‚            โ”‚  โ”‚                          0.        ,    0.        ,  414.08354111,  416.3...
           โ”‚    โ”‚            โ”‚  โ”” 44100
           โ”‚    โ”‚            โ”” tensor([[ 0.0017,  0.0017,  0.0018,  ..., -0.0013, -0.0010, -0.0006]],
           โ”‚    โ”‚                     device='cuda:0')
           โ”‚    โ”” <function BasePitchExtractor.post_process at 0x7fa72e5d2b90>
           โ”” <fish_diffusion.modules.pitch_extractors.parsel_mouth.ParselMouthPitchExtractor object at 0x7fa71f22bbb0>

  File "/mnt/fish-diffusion/fish_diffusion/modules/pitch_extractors/builder.py", line 59, in post_process
    return interpolate(time_frame, time_org, f0, left=f0[0], right=f0[-1])
           โ”‚           โ”‚           โ”‚         โ”‚        โ”‚            โ”” tensor([ 414.0835,  416.3744,  420.1241,  424.6104,  428.1972,  429.9621,
           โ”‚           โ”‚           โ”‚         โ”‚        โ”‚                       429.6776,  427.2525,  420.7615,  410.0335,...
           โ”‚           โ”‚           โ”‚         โ”‚        โ”” tensor([ 414.0835,  416.3744,  420.1241,  424.6104,  428.1972,  429.9621,
           โ”‚           โ”‚           โ”‚         โ”‚                   429.6776,  427.2525,  420.7615,  410.0335,...
           โ”‚           โ”‚           โ”‚         โ”” tensor([ 414.0835,  416.3744,  420.1241,  424.6104,  428.1972,  429.9621,
           โ”‚           โ”‚           โ”‚                    429.6776,  427.2525,  420.7615,  410.0335,...
           โ”‚           โ”‚           โ”” tensor([0.0697, 0.0813, 0.0929, 0.1045, 0.1161, 0.1277, 0.1393, 0.1509, 0.1625,
           โ”‚           โ”‚                     0.1741, 0.1858, 0.1974, 0.2090, 0.220...
           โ”‚           โ”” tensor([0.0000, 0.0116, 0.0232, 0.0348, 0.0464, 0.0580, 0.0697, 0.0813, 0.0929,
           โ”‚                     0.1045, 0.1161, 0.1277, 0.1393, 0.150...
           โ”” <function interpolate at 0x7fa731706710>

  File "/mnt/fish-diffusion/fish_diffusion/utils/tensor.py", line 67, in interpolate
    i = torch.clip(torch.searchsorted(xp, x, right=True), 1, len(xp) - 1)
        โ”‚     โ”‚    โ”‚     โ”‚            โ”‚   โ”‚                      โ”” tensor([0.0697, 0.0813, 0.0929, 0.1045, 0.1161, 0.1277, 0.1393, 0.1509, 0.1625,
        โ”‚     โ”‚    โ”‚     โ”‚            โ”‚   โ”‚                                0.1741, 0.1858, 0.1974, 0.2090, 0.220...
        โ”‚     โ”‚    โ”‚     โ”‚            โ”‚   โ”” tensor([0.0000, 0.0116, 0.0232, 0.0348, 0.0464, 0.0580, 0.0697, 0.0813, 0.0929,
        โ”‚     โ”‚    โ”‚     โ”‚            โ”‚             0.1045, 0.1161, 0.1277, 0.1393, 0.150...
        โ”‚     โ”‚    โ”‚     โ”‚            โ”” tensor([0.0697, 0.0813, 0.0929, 0.1045, 0.1161, 0.1277, 0.1393, 0.1509, 0.1625,
        โ”‚     โ”‚    โ”‚     โ”‚                      0.1741, 0.1858, 0.1974, 0.2090, 0.220...
        โ”‚     โ”‚    โ”‚     โ”” <built-in method searchsorted of type object at 0x7fa79fea4880>
        โ”‚     โ”‚    โ”” <module 'torch' from '/opt/miniconda/envs/fish/lib/python3.10/site-packages/torch/__init__.py'>
        โ”‚     โ”” <built-in method clip of type object at 0x7fa79fea4880>
        โ”” <module 'torch' from '/opt/miniconda/envs/fish/lib/python3.10/site-packages/torch/__init__.py'>

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument self in method wrapper_CUDA_Tensor_searchsorted)

ๅœจ CUDA 12.1็š„ๆœบๅญไธŠๆŠ›ๅ‡บไบ†่ฟ™ไธช้”™่ฏฏ

Wed Apr  5 20:25:55 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB            Off| 00000000:00:08.0 Off |                    0 |
| N/A   34C    P0               53W / 300W|      0MiB / 32768MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

ไธ็Ÿฅ้“ๆ˜ฏไป€ไนˆ้—ฎ้ข˜ๆ

ๅคš่Š‚็‚น้›†็พค่ฎญ็ปƒ้—ฎ้ข˜

ๅฆ‚้ข˜ใ€‚
ๆˆ‘ๅœจไธ€ไธชๆœ‰4ไธช่Š‚็‚น๏ผŒๆฏไธช่Š‚็‚นๆœ‰8ๅผ GPU็š„้›†็พคไธŠๅฐ่ฏ•่ฎญ็ปƒใ€‚
ๅ…ถไธญtrainer็š„้…็ฝฎๅทฒ่ขซไฟฎๆ”น๏ผŒๅนถไธ”่ฎญ็ปƒๆญฃๅธธๅฏๅŠจ๏ผŒ่ฏปๅ–ๅˆฐไบ†32ไธชRANK
image
ไฝ†ๆˆ‘่ง‚ๅฏŸๅˆฐ๏ผŒๅฐฝ็ฎกๆฏไธช่Š‚็‚นไธŠ็š„ๅ•Epoch ็š„batchๅ‡ๅฐ‘ไบ†๏ผŒไฝ†step่ฎก็ฎ—ๆฒกๆœ‰ๅŒๆญฅใ€‚
image
ๅœจๆญคๆƒ…ๅ†ตไธ‹๏ผŒ่ฟ่กŒๅˆฐ้ขๅฎšsteps้œ€่ฆ็š„Epochๆ›ดๅคš๏ผŒ่ฏท้—ฎ่ฟ™ๆ˜ฏๅฆไผšๅฏนๆ”ถๆ•›้€ ๆˆๅฝฑๅ“๏ผŸ
ๆˆ‘ๅบ”ๅฝ“ไฝฟ็”จsteps่ฎก็ฎ—่ฟ˜ๆ˜ฏEpoch๏ผŸ

inferenceๆŠฅ้”™

1

ๆŒ‰็…งโ€œQuick FishSVC Guideโ€ๅšไธ‹ๆฅ๏ผŒ่ฎญ็ปƒไบ†10000ๅคšไธชsteps, ๆŽจ็†ๆต‹่ฏ•ไบ†ไธ‹ๆŠฅ้”™ใ€‚

ๆ„Ÿ่ง‰ๆ˜ฏ่ฎญ็ปƒ็š„ๆจกๅž‹ๅ’ŒๆŽจ็†็š„ๆจกๅž‹ๅฐบๅฏธไธๅŒน้…็š„้—ฎ้ข˜๏ผŒ๏ผŒ๏ผŒ

Error when trying to train with `cn-hubert-soft-600-singers-pretrained-v1.ckpt`

Traceback (most recent call last):
  File "/content/drive/MyDrive/porter-diffusion/train.py", line 217, in <module>
    trainer.fit(model, train_loader, valid_loader, ckpt_path=args.resume)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1047, in _run
    self._restore_modules_and_callbacks(ckpt_path)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 991, in _restore_modules_and_callbacks
    self._checkpoint_connector.restore_model()
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 261, in restore_model
    self.trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
  File "/usr/local/envs/fish/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 363, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"])
  File "/usr/local/envs/fish/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FishDiffusion:
	Missing key(s) in state_dict: "model.speaker_encoder.embedding.weight". 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.